Found out that Whisper throws a hissy fit in the form of a RuntimeError if the
there is no speech in the audio. We should consider catching this.
> RuntimeError: stack expects a non-empty TensorList
> stdout: "No active speech found in audio"
For the moment we can check that no audio throws an error and leave this as a TODO
Functional tests for WordFeatureExtractor consist of making sure it can find
words known in advance. The Harvard Sentences [1] are a useful means of doing
that. These are 'standard sentences' that are used for speech quality
measurements, and so would be decent candidates for assessing word recognition.
The Open Speech REpository [2] has samples of sentences to download.
In testing, the Whisper medium model had trouble with a few words:
- glue
- well
- punch
- truck
I'm not sure why. Even when I recorded myself speaking the Harvard sentences in
higher quality (OSR files are 8kHz range) it would still not recognise these
words. A separate functional test of only those words was added as a result.
This would perhaps be worth exploring in more detail if there was time.
[1]: See eg https://www.cs.columbia.edu/~hgs/audio/harvard.html
[2]: https://www.voiptroubleshooter.com/open_speech/index.html
Uses a manually-crafted video with laughters between 15-20s.
Test takes LaughFE's internal Feature time adjustment into account (see related
commit).
Note: very slow test
@see: df3c559
Functional tests for LoudAudioFeatureExtractor
Currently uses one manually-generated video with blank audio except between
15-20s where 1-2 sine tones are present
Only has a single property at present: SAMPLE_DIR for the path to where
sample videos are stored
TestVideoActivityFEFunctional now inherits from this instead of unittest.TestCase