Functional tests for WordFeatureExtractor consist of making sure it can find
words known in advance. The Harvard Sentences [1] are a useful means of doing
that. These are 'standard sentences' that are used for speech quality
measurements, and so would be decent candidates for assessing word recognition.
The Open Speech REpository [2] has samples of sentences to download.
In testing, the Whisper medium model had trouble with a few words:
- glue
- well
- punch
- truck
I'm not sure why. Even when I recorded myself speaking the Harvard sentences in
higher quality (OSR files are 8kHz range) it would still not recognise these
words. A separate functional test of only those words was added as a result.
This would perhaps be worth exploring in more detail if there was time.
[1]: See eg https://www.cs.columbia.edu/~hgs/audio/harvard.html
[2]: https://www.voiptroubleshooter.com/open_speech/index.html
Calls pulled out relate to setup and working of Whisper:
- _whispermodel()
- _batched_inference_pipeline()
- _transcribe()
Defaults defined: model, device, compute type, beamsize, batchsize, pipeline type
Tests:
- basic init
- init with no media
- run() with no words (early exit 0 Features)
- run() with mocked transcribe
NOTE: these are unit tests and do not exercise Whisper
BREAKING CHANGE: no words to WFE are no longer an error, they raise a notice
WordFeatureExtractor is not fast- even the import is slow. However, it processes
files and returns Features corresponding to matched words.
WhisperFE will be slightly different to other FEs in that there is/are specific
target words to be searched for. Not specifying these could be an error (this
commit specifies this as such) but a better approach may be to downgrade that to
a (logging) notice, and simply match nothing / early exit.
Uses a manually-crafted video with laughters between 15-20s.
Test takes LaughFE's internal Feature time adjustment into account (see related
commit).
Note: very slow test
@see: df3c559
To help functional testing, LaughFE's internal adjustment times are exposed.
Recap: when a laugh is detected by LaughFE, the time of the laugh itself it not
used directly; instead, the resulting Feature has some time prepended to try to
capture the thing that caused the laugh.
When functional testing the FEs we set up specially-crafted videos with features
at known points, so to make sure the LaughterFE is tested correctly we adjust
the tests by the amount of time the FE adjusts by so that it properly tests the
intended behaviour.
@see:
- feature_extractors.py::LaughterFeatureExtractor
Functional tests for LoudAudioFeatureExtractor
Currently uses one manually-generated video with blank audio except between
15-20s where 1-2 sine tones are present
Only has a single property at present: SAMPLE_DIR for the path to where
sample videos are stored
TestVideoActivityFEFunctional now inherits from this instead of unittest.TestCase
Problems fixed:
- Feature repr did not include feature_extractor since that API was changed
- Intervals that were equivalent were not equal, so Features were not properly
sorted or equal
This was done so the collection of loudnesses from an audio file could be mocked
in testing, but improves readability
TODO: review number of params and consider further refactoring
Outputs a representation of the Features extrated by the pipeline
Intent is to write a FE that takes the output so that a pipeline can be
're-run'
Output JSON could also be used with external tools
This pvaes the way for parts of the pipeline that do not produce videos,
such as JSON, images, clips etc
TODO: rename module video_producers → producers
Take the mean of non-overlapping windows of scores
Input: list of tuples in the format (time, score)
Output: list of tuples in the format (time, mean_score)
(reduced set)