Calls pulled out relate to setup and working of Whisper:
- _whispermodel()
- _batched_inference_pipeline()
- _transcribe()
Defaults defined: model, device, compute type, beamsize, batchsize, pipeline type
Tests:
- basic init
- init with no media
- run() with no words (early exit 0 Features)
- run() with mocked transcribe
NOTE: these are unit tests and do not exercise Whisper
BREAKING CHANGE: no words to WFE are no longer an error, they raise a notice
WordFeatureExtractor is not fast- even the import is slow. However, it processes
files and returns Features corresponding to matched words.
WhisperFE will be slightly different to other FEs in that there is/are specific
target words to be searched for. Not specifying these could be an error (this
commit specifies this as such) but a better approach may be to downgrade that to
a (logging) notice, and simply match nothing / early exit.
Problems fixed:
- Feature repr did not include feature_extractor since that API was changed
- Intervals that were equivalent were not equal, so Features were not properly
sorted or equal
This was done so the collection of loudnesses from an audio file could be mocked
in testing, but improves readability
TODO: review number of params and consider further refactoring
Outputs a representation of the Features extrated by the pipeline
Intent is to write a FE that takes the output so that a pipeline can be
're-run'
Output JSON could also be used with external tools
This pvaes the way for parts of the pipeline that do not produce videos,
such as JSON, images, clips etc
TODO: rename module video_producers → producers
Take the mean of non-overlapping windows of scores
Input: list of tuples in the format (time, score)
Output: list of tuples in the format (time, mean_score)
(reduced set)
Drops lowest n% (default:33%) of scdet scores, since it scores every
frame
Python being what it is, this could be a single line in another method
but pulling it out into another function:
- makes explicit what we are doing and lets us document why
- makes for easier testing
Uses pyloudnorm under the hood to determine the loudness of the supplied
media file (handles videos transparently)
TBC: some sort of limiter on the number produced
BREAKING CHANGE: source now refers to a Source object, the FE that
created the Feature is now referred to by feature_extractor; path is
dropped
This should be more consistent, plus we needed a reference to the
original Source kept around anyway -- path worked but a Source object is
more consistent and explicit about intent
This adds functionality for getting laughter by using jrgillick's
laughter detection library
NB python expects all of the feature extractor's dependencies to be
available; perhaps in future we can do something even fancier like
activating another python env
[retroactive commit]