7 次代码提交

作者 SHA1 备注 提交日期
  Rob Hallam 93b4317788 add draft agenda for meeting 2 4 个月前
  Rob Hallam 0075138a2d add note about and link to React PoC ('inquisitor') 4 个月前
  Rob Hallam aa82f3b756 [Meeting 1] add agenda and pre-meeting notes for 2024-06-26 4 个月前
  Rob Hallam 6f28ad7f50 add React resources 4 个月前
  Rob Hallam 85c9e30cb7 typos 5 个月前
  Rob Hallam f9dd5f473a add project preplanning - overview of pt q app & video highlights 5 个月前
  Rob Hallam 30c6f8905a add outcomes from Meeting 0 5 个月前
共有 2 个文件被更改,包括 229 次插入0 次删除
  1. +125
    -0
      meetings.org
  2. +104
    -0
      project-preplanning.org

+ 125
- 0
meetings.org 查看文件

@@ -0,0 +1,125 @@
* 2024-06-18 1330: Meeting 0

Present:

- Matthew Barr (MB)
- Rob Hallam (RH)

** Project Choice

RH gave a quick (ish) pitch of several of the ideas listed:

- Queen Bee Finder :: interesting, but data access issues
- BorgBackup Visualiser :: lack of familiarity of Emacs Lisp for that portion
- FAIR principles `data monkey' :: the `fallback' option, though still interesting to RH
- Linux find fonts by descriptive tags :: slightly niche, and nebulous in the data gathering/scraping aspect
- Video Highlight Finder :: interesting but potentially tricky; more *research*-oriented _(A)_
- Cybersec Learning Game :: less well-defined in RH's head than it was, can the mechanics be interesting (MB mentioned the potential pitfall of `chocolate-covered broccoli', see [fn:chocobroc])
- Patient Questionnaire App :: best defined and circumscribed, plenty of potential to do with; more *development*-oriented _(B)_
- Google Fit exporter :: fairly standard, `speak to an API and do something interesting with the output' type project

Several may well be propitious.

*Outcome*: RH to investigate _(A)_ and _(B)_ further for prior art, feasibility etc and report back to MB with in 1-2 days.


[fn:chocobroc] This seems to have originated from Prof Amy Bruckman's 1999 paper, /`Can Educational Be Fun?'/ -- ``Most attempts at making software both educational and fun end up being neither. Fun is often treated like a sugar coating to be added to an educational core. Which makes about as much sense as chocolate-dipped broccoli.'' [[https://faculty.cc.gatech.edu/~asb/papers/bruckman-gdc99.pdf][Can Educational Be Fun? (PDF)]]

** Additional Considerations

RH mentioned a recent bereavement which happened during exams, with the knock-on effects of needing to sit exams during summer diet in August, and having to tend to affairs (clear house, etc) over the coming weeks and months. MB highlighted Good Cause options. RH mentioned he had been in contact with Isabella Widger, who had provided useful support advice previously and who he felt comfortable approaching again if needed.

*Outcome*: RH to balance requirements, will keep MB apprised if necessary and contact support if needed

** AOB

MB's communication preference is email for anything needing specifically `actioned'.

* 2024-06-26 1530: Meeting 1

** Agenda

- [Questionnaire app] :: v basic React PoC done; other features seem implementable in React based on checking; next steps: user stories, plan UI
- [Gaming highlight generator] :: laughter-detection does find laughs (when targeted: some FPs and FNs, when not targetd: many FPs, unknown FNs)
- [Next steps] :: (app) pretty standard dev workflow, more focus on rapid prototypes & early feedback before formal user testing / evaluation later on; (highlights) focus on latter parts of `pipeline' first instead of feature detection- ie processing timestamps (consolidating & into clips), maybe UI for user to adjust clips (selection, times) before highlights made

** Pre-Meeting

*** Very Basic React App Proof of Concept

I decided to try out React since I haven't before just to see if I could get something that read in a questionnaire from JSON and could present it as a questionnaire.

http://inquisitor.roberthallam.org

Screenshot:

[[file:~/downloads/inquisitor-demo-example1.png]]

(alternative link)

*** Highlight Generator Pipeline / Workflow

[[file:~/downloads/highlightgeneration-process.svg]]

([[https://roberthallam.com/files/highlightgeneration-process.svg][alternative link]])

Not much to look at as it's only a few hours' work but lays the foundation for rapid prototyping / iterating once design is in place!

*** Laughter Detection

Tricky mix of specific package versions needed to get this working in 2024! Also needs a minor change (1 line) due to librosa API update. Can also be run on Colab:

[[file:~/downloads/colab-laughdetect.png]]

([[https://roberthallam.com/files/colab-laughdetect.png][alternative link]])

Observations:

- running on a ~5 minute audio clip in AAC format takes ~30s (so a 3 hour video would take ~18 minutes)
- qualitative observation: default parameters have a reasonable mix of detecting obvious laughs with a small number of FPs (and seems to have a few FNs too)

To test this I generated five audio clips- four were five minutes in duration, selected from longer clips, intended to be representative of obvious laughter, non-obvious / subtle laughter. The final audio clip was the full audio track of around 2½ hours.

Results

| Index | Duration | Context | № Detected | FPs | Comments |
|-------+----------+------------------------------------------------------+------------+-----+--------------------------------------------------------------------------|
| 1 | 5:06 | Multiple laughs from different speakers around ~1min | 15 | 3 | Seems inconsistent in detection when laughter is ongoing and overlapping |
| 2 | 5:08 | Mostly discussion, couple chuckles etc | 0 | N/A | Arguably ~4/5 FNs |
| 3 | 5:10 | TBC | 8 | 5 | Detected segments are short |
| 4 | 5:11 | One bit of obvious laughter | 2 | 1 | Detects the obvious bit of laughter |
| 5 | 2:37:11 | Full-length video of gaming session | 74 | 65 | Quite a lot of FPs! |

/Note: Clips are not exactly five minutes due to the way ffmpeg cuts when doing a stream copy/

The results of that testing suggests two things about the default parameters:

- laughter can be detected, even when it's coming from multiple speakers
- those parameters produce a lot of FPs when not targeted

Given that, a two-pass approach might yield better results.


* 2024-07-11 1400: Meeting 2

** Agenda

/(note: GitLab has been down since Monday)/

For scoping, done:
- conceptual overview of pipeline
- class sketch
- some user story cards (more TBD)

Still to do:
- plan UI
- collect and test other 'feature extractors' than laughter-detection

Also done:
- simple proof-of-concept of highlight pipeline that can take input videos and produce output
- ffmpeg invocation is a bit slow, might change approach

Next steps for coming week:
- replace PoC version (turn class sketch into code)
- get UI plan done
- begin integrating more feature extractors

+ 104
- 0
project-preplanning.org 查看文件

@@ -0,0 +1,104 @@
#+LATEX_HEADER: \usepackage[a4paper]{geometry}
#+LATEX_HEADER: \usepackage{parskip}
#+LaTeX_HEADER: \hypersetup{linktoc = all, colorlinks = true, urlcolor = blue, citecolor = PaleGreen1, linkcolor = black}
* Patient Questionnaire App

** Elevator Pitch

A simple app to let patients fill in relevant questionnaires -- eg DLQI & POEM for dermatology -- as an aid in monitoring their condition and facilitating discussion with clinicians.

** Prior Art

Apps exist for both of the proposed questionnaires:

- [[https://play.google.com/store/apps/details?id=uk.ac.cardiff.dlqi][DLQI app by Cardiff University]]
- [[https://play.google.com/store/apps/details?id=my.eczema.tracker][My Eczema Tracker]] (MET)

I have downloaded and tried out both apps.

** How This Project Might Proceed

While apps exist for the two questionnaires mentioned in the `pitch', there is still scope to do work which will improve upon what exists:

- combined app :: most simply, there are two apps, having a single app for both questionnaires is surely preferable (/stretch goal/: define questionnaires in eg JSON, the app can dynamically expand without needing updated)
- encryption :: even though the data is stored locally, it is still worthwhile to encrypt it at rest for privacy reasons
- export‡ :: being able to export eg a PDF or similar would be a boon for sharing with clinicians (eg this could be printed off to be added to notes); other formats are possible too
- patient notes† :: being able to take free text notes, which could be associated with a questionnaire or be `freestanding', would aid memory for patients in consultations
- graphing† :: since these questionnaires usually produce a score, this could be charted over time (handy for spotting patterns)
- reminders† :: a periodic notification (eg weekly, bi-weekly, monthly) would help remind patients to track their symptoms

This project would seek to put together an app using a simple mobile framework (eg JQueryUI, Cordova, etc) that implements as many of the features above as is feasible. Usability feedback could be sought by i) general users ii) clinicians.

†: The My Eczema Tracker app looks like it has these features from its screenshots

‡: My Eczema Tracker seem to also offer this: ``You can also download all your results to your device for you to review or share with your healthcare professional.'' but it is unclear what format this is (seems to be CSV)

While MET already has some features I would like to implement for the other questionnaire, it does have a minor usability irk insofar as the user needs to scroll to hit `next' (tested with an old OnePlus3).

Rough outline:

1. early phase: design key parts of the app (user story cards, MoSCoW, etc), investigate & decide which app framework to use
2. mid phase: evolve & refine prototype & design usability tests, seek input
3. late phrase: user testing, demo & write-up

** Resources

*** React

- [[https://github.com/wix/react-native-notifications][React native notifications]] || [[https://github.com/zo0r/react-native-push-notification][react-native-push-notification]]
- [[https://mui.com/joy-ui/getting-started/usage/][JoyUI]]
- [[https://tailwindcss.com/docs/guides/create-react-app][TailwindCSS & React]] (didn't find styling I was happy with for radio buttons)
- [[https://ui.shadcn.com/docs/components/form][shadcn forms]] (dep: [[https://zod.dev/][Zod]])
- [[https://www.reddit.com/r/reactjs/comments/1d2ptx2/what_ui_frameworks_do_yall_use_or_recommend/][recent reddit discussion on UIs]]
- [[https://web.dev/articles/what-are-pwas][PWA overview]]
- export to PDF: [[https://react-pdf.org/][React-pdf]]

* Automatic Video Game Footage Highlight Generator

** Elevator Pitch

Quite often the full video is less interesting than the highlights -- this might be a funny moment, an intense moment, etc -- but scanning through footage for these is a time-consuming and boring task; could some simple heuristics do a reasonable job of finding parts of a full video to use for highlights?

** General Approach

Where the video is quiet it is unlikely to be a highlight. Interesting bits might be:

1. where the *audio level peaks* (eg someone speaking under stress, multiple people speaking, loud part of a game)
2. where there is *laughter*, something funny probably happened or was said
3. where there is lots of *motion* something interesting may be happening

To find these points:

1. there surely exist tools for absolute loudness detection (and perhaps perceptual loudness)
2. laughter detection might be feasible by training or tuning a model
3. motion could be detected by parts of the video where the bitrate increases (if VBR) or where encoding artifacts are more prominent (if CBR) -- not sure if the latter are detectable programmatically

** Prior Art

- laugh detector :: [[https://github.com/jrgillick/laughter-detection][jrgillick/laughter-detection]] -- from [fn:laughdetect]
- loudness detection :: looks like python [[https://librosa.org/doc/main/index.html][librosa]] ([[https://librosa.org/doc/main/generated/librosa.feature.rms.html][RMS function]]) is [[https://stackoverflow.com/a/73255652][an option]]
- detecting multiple speakers :: this is part of /speaker diarisation/, which there are options for (eg [[https://cmusphinx.github.io/wiki/speakerdiarization/][LIUM / CMUSphinx]], [[https://github.com/pyannote/pyannote-audio][pyannotate-audio]] (NB needs HuggingFace token), etc

[fn:laughdetect] 2021 Jon Gillick, Wesley Deng, Kimiko Ryokai, and David Bamman, *``Robust Laughter Detection in Noisy Environments."* INTERSPEECH [[https://www.isca-archive.org/interspeech_2021/gillick21_interspeech.pdf][(PDF link)]]

** Potential Pitfalls

There are a few caveats:

- I don't know how well the `laughter detection' model works with the sample data (ie my own video files)
- I don't know the first thing about training a model or tuning one (I suspect I would need several thousands of samples to train, perhaps fewer to tune?)
- I lack hardware for tuning (my GPU is ancient and doesn't have features of newer GPUs)
- Working with video files can be quite slow in general

** How This Project Might Proceed

There are two main `strands' to this:

- the techniques for finding good highlights
- using these to actually generate videos either automatically or semi-automatically

So the approach could be:

1. early phase: write a short script or two that uses ffmpeg to extract ROI from videos (trivial); find out if there are other pre-trained audio models which could be tuned; get laughter-detection, librosa &co set up and see if they produce useful output
2. mid phrase: refine- ie try and make process faster and more accurate
3. late phase: tidy up- make process more user friendly (options: completely automated output; generate several and let user pick which to keep), write up

正在加载...
取消
保存