Calls pulled out relate to setup and working of Whisper:
- _whispermodel()
- _batched_inference_pipeline()
- _transcribe()
Defaults defined: model, device, compute type, beamsize, batchsize, pipeline type
Tests:
- basic init
- init with no media
- run() with no words (early exit 0 Features)
- run() with mocked transcribe
NOTE: these are unit tests and do not exercise Whisper
BREAKING CHANGE: no words to WFE are no longer an error, they raise a notice
WordFeatureExtractor is not fast- even the import is slow. However, it processes
files and returns Features corresponding to matched words.
WhisperFE will be slightly different to other FEs in that there is/are specific
target words to be searched for. Not specifying these could be an error (this
commit specifies this as such) but a better approach may be to downgrade that to
a (logging) notice, and simply match nothing / early exit.
Uses a manually-crafted video with laughters between 15-20s.
Test takes LaughFE's internal Feature time adjustment into account (see related
commit).
Note: very slow test
@see: df3c559
To help functional testing, LaughFE's internal adjustment times are exposed.
Recap: when a laugh is detected by LaughFE, the time of the laugh itself it not
used directly; instead, the resulting Feature has some time prepended to try to
capture the thing that caused the laugh.
When functional testing the FEs we set up specially-crafted videos with features
at known points, so to make sure the LaughterFE is tested correctly we adjust
the tests by the amount of time the FE adjusts by so that it properly tests the
intended behaviour.
@see:
- feature_extractors.py::LaughterFeatureExtractor
Functional tests for LoudAudioFeatureExtractor
Currently uses one manually-generated video with blank audio except between
15-20s where 1-2 sine tones are present
Only has a single property at present: SAMPLE_DIR for the path to where
sample videos are stored
TestVideoActivityFEFunctional now inherits from this instead of unittest.TestCase
Problems fixed:
- Feature repr did not include feature_extractor since that API was changed
- Intervals that were equivalent were not equal, so Features were not properly
sorted or equal
This was done so the collection of loudnesses from an audio file could be mocked
in testing, but improves readability
TODO: review number of params and consider further refactoring
Outputs a representation of the Features extrated by the pipeline
Intent is to write a FE that takes the output so that a pipeline can be
're-run'
Output JSON could also be used with external tools
This pvaes the way for parts of the pipeline that do not produce videos,
such as JSON, images, clips etc
TODO: rename module video_producers → producers
Take the mean of non-overlapping windows of scores
Input: list of tuples in the format (time, score)
Output: list of tuples in the format (time, mean_score)
(reduced set)
Drops lowest n% (default:33%) of scdet scores, since it scores every
frame
Python being what it is, this could be a single line in another method
but pulling it out into another function:
- makes explicit what we are doing and lets us document why
- makes for easier testing
Uses pyloudnorm under the hood to determine the loudness of the supplied
media file (handles videos transparently)
TBC: some sort of limiter on the number produced