Просмотр исходного кода

[Meeting 1] add agenda and pre-meeting notes for 2024-06-26

main
Rob Hallam 4 месяцев назад
Родитель
Сommit
aa82f3b756
1 измененных файлов: 49 добавлений и 0 удалений
  1. +49
    -0
      meetings.org

+ 49
- 0
meetings.org Просмотреть файл

@@ -35,3 +35,52 @@ RH mentioned a recent bereavement which happened during exams, with the knock-on

MB's communication preference is email for anything needing specifically `actioned'.

* 2024-06-26 1530: Meeting 1

** Agenda

- [Questionnaire app] :: v basic React PoC done; other features seem implementable in React based on checking; next steps: user stories, plan UI
- [Gaming highlight generator] :: laughter-detection does find laughs (when targeted: some FPs and FNs, when not targetd: many FPs, unknown FNs)
- [Next steps] :: (app) pretty standard dev workflow, more focus on rapid prototypes & early feedback before formal user testing / evaluation later on; (highlights) focus on latter parts of `pipeline' first instead of feature detection- ie processing timestamps (consolidating & into clips), maybe UI for user to adjust clips (selection, times) before highlights made

** Pre-Meeting

*** Highlight Generator Pipeline / Workflow

[[file:~/downloads/highlightgeneration-process.svg]]

([[https://roberthallam.com/files/highlightgeneration-process.svg][alternative link]])

*** Laughter Detection

Tricky mix of specific package versions needed to get this working in 2024! Also needs a minor change (1 line) due to librosa API update. Can also be run on Colab:

[[file:~/downloads/colab-laughdetect.png]]

([[https://roberthallam.com/files/colab-laughdetect.png][alternative link]])

Observations:

- running on a ~5 minute audio clip in AAC format takes ~30s (so a 3 hour video would take ~18 minutes)
- qualitative observation: default parameters have a reasonable mix of detecting obvious laughs with a small number of FPs (and seems to have a few FNs too)

To test this I generated five audio clips- four were five minutes in duration, selected from longer clips, intended to be representative of obvious laughter, non-obvious / subtle laughter. The final audio clip was the full audio track of around 2½ hours.

Results

| Index | Duration | Context | № Detected | FPs | Comments |
|-------+----------+------------------------------------------------------+------------+-----+--------------------------------------------------------------------------|
| 1 | 5:06 | Multiple laughs from different speakers around ~1min | 15 | 3 | Seems inconsistent in detection when laughter is ongoing and overlapping |
| 2 | 5:08 | Mostly discussion, couple chuckles etc | 0 | N/A | Arguably ~4/5 FNs |
| 3 | 5:10 | TBC | 8 | 5 | Detected segments are short |
| 4 | 5:11 | One bit of obvious laughter | 2 | 1 | Detects the obvious bit of laughter |
| 5 | 2:37:11 | Full-length video of gaming session | 74 | 65 | Quite a lot of FPs! |

/Note: Clips are not exactly five minutes due to the way ffmpeg cuts when doing a stream copy/

The results of that testing suggests two things about the default parameters:

- laughter can be detected, even when it's coming from multiple speakers
- those parameters produce a lot of FPs when not targeted

Given that, a two-pass approach might yield better results.

Загрузка…
Отмена
Сохранить