|
|
@@ -35,3 +35,52 @@ RH mentioned a recent bereavement which happened during exams, with the knock-on |
|
|
|
|
|
|
|
MB's communication preference is email for anything needing specifically `actioned'. |
|
|
|
|
|
|
|
* 2024-06-26 1530: Meeting 1 |
|
|
|
|
|
|
|
** Agenda |
|
|
|
|
|
|
|
- [Questionnaire app] :: v basic React PoC done; other features seem implementable in React based on checking; next steps: user stories, plan UI |
|
|
|
- [Gaming highlight generator] :: laughter-detection does find laughs (when targeted: some FPs and FNs, when not targetd: many FPs, unknown FNs) |
|
|
|
- [Next steps] :: (app) pretty standard dev workflow, more focus on rapid prototypes & early feedback before formal user testing / evaluation later on; (highlights) focus on latter parts of `pipeline' first instead of feature detection- ie processing timestamps (consolidating & into clips), maybe UI for user to adjust clips (selection, times) before highlights made |
|
|
|
|
|
|
|
** Pre-Meeting |
|
|
|
|
|
|
|
*** Highlight Generator Pipeline / Workflow |
|
|
|
|
|
|
|
[[file:~/downloads/highlightgeneration-process.svg]] |
|
|
|
|
|
|
|
([[https://roberthallam.com/files/highlightgeneration-process.svg][alternative link]]) |
|
|
|
|
|
|
|
*** Laughter Detection |
|
|
|
|
|
|
|
Tricky mix of specific package versions needed to get this working in 2024! Also needs a minor change (1 line) due to librosa API update. Can also be run on Colab: |
|
|
|
|
|
|
|
[[file:~/downloads/colab-laughdetect.png]] |
|
|
|
|
|
|
|
([[https://roberthallam.com/files/colab-laughdetect.png][alternative link]]) |
|
|
|
|
|
|
|
Observations: |
|
|
|
|
|
|
|
- running on a ~5 minute audio clip in AAC format takes ~30s (so a 3 hour video would take ~18 minutes) |
|
|
|
- qualitative observation: default parameters have a reasonable mix of detecting obvious laughs with a small number of FPs (and seems to have a few FNs too) |
|
|
|
|
|
|
|
To test this I generated five audio clips- four were five minutes in duration, selected from longer clips, intended to be representative of obvious laughter, non-obvious / subtle laughter. The final audio clip was the full audio track of around 2½ hours. |
|
|
|
|
|
|
|
Results |
|
|
|
|
|
|
|
| Index | Duration | Context | № Detected | FPs | Comments | |
|
|
|
|-------+----------+------------------------------------------------------+------------+-----+--------------------------------------------------------------------------| |
|
|
|
| 1 | 5:06 | Multiple laughs from different speakers around ~1min | 15 | 3 | Seems inconsistent in detection when laughter is ongoing and overlapping | |
|
|
|
| 2 | 5:08 | Mostly discussion, couple chuckles etc | 0 | N/A | Arguably ~4/5 FNs | |
|
|
|
| 3 | 5:10 | TBC | 8 | 5 | Detected segments are short | |
|
|
|
| 4 | 5:11 | One bit of obvious laughter | 2 | 1 | Detects the obvious bit of laughter | |
|
|
|
| 5 | 2:37:11 | Full-length video of gaming session | 74 | 65 | Quite a lot of FPs! | |
|
|
|
|
|
|
|
/Note: Clips are not exactly five minutes due to the way ffmpeg cuts when doing a stream copy/ |
|
|
|
|
|
|
|
The results of that testing suggests two things about the default parameters: |
|
|
|
|
|
|
|
- laughter can be detected, even when it's coming from multiple speakers |
|
|
|
- those parameters produce a lot of FPs when not targeted |
|
|
|
|
|
|
|
Given that, a two-pass approach might yield better results. |