TargetTimeAdjuster will adjust a list of Features until it is within an optional
margin of a target total duration.
Helper functions:
- _determine_margin() :: figure out the max and min cutoff times, considering
margin and margin strategy (percent / absolute)
- _features_total_time() :: basic sum of list of Features' durations
TODO: rename to TargetDurationAdjuster ? rename 'strategy' ??
Adjusters will be used to modify a list of Features. This could either be:
- to modify the overall set (eg to target a time)
- to modify individual Features
The most important Adjuster will be one that targets an overall time, eg:
"modify this list of Features such that their times add up to 1 minute (either ±
a % or a hard limit)"
@see: feature_extractors.py::FeatureExtractor
Calls pulled out relate to setup and working of Whisper:
- _whispermodel()
- _batched_inference_pipeline()
- _transcribe()
Defaults defined: model, device, compute type, beamsize, batchsize, pipeline type
Tests:
- basic init
- init with no media
- run() with no words (early exit 0 Features)
- run() with mocked transcribe
NOTE: these are unit tests and do not exercise Whisper
BREAKING CHANGE: no words to WFE are no longer an error, they raise a notice
WordFeatureExtractor is not fast- even the import is slow. However, it processes
files and returns Features corresponding to matched words.
WhisperFE will be slightly different to other FEs in that there is/are specific
target words to be searched for. Not specifying these could be an error (this
commit specifies this as such) but a better approach may be to downgrade that to
a (logging) notice, and simply match nothing / early exit.
To help functional testing, LaughFE's internal adjustment times are exposed.
Recap: when a laugh is detected by LaughFE, the time of the laugh itself it not
used directly; instead, the resulting Feature has some time prepended to try to
capture the thing that caused the laugh.
When functional testing the FEs we set up specially-crafted videos with features
at known points, so to make sure the LaughterFE is tested correctly we adjust
the tests by the amount of time the FE adjusts by so that it properly tests the
intended behaviour.
@see:
- feature_extractors.py::LaughterFeatureExtractor
Problems fixed:
- Feature repr did not include feature_extractor since that API was changed
- Intervals that were equivalent were not equal, so Features were not properly
sorted or equal
This was done so the collection of loudnesses from an audio file could be mocked
in testing, but improves readability
TODO: review number of params and consider further refactoring
Outputs a representation of the Features extrated by the pipeline
Intent is to write a FE that takes the output so that a pipeline can be
're-run'
Output JSON could also be used with external tools
This pvaes the way for parts of the pipeline that do not produce videos,
such as JSON, images, clips etc
TODO: rename module video_producers → producers
Take the mean of non-overlapping windows of scores
Input: list of tuples in the format (time, score)
Output: list of tuples in the format (time, mean_score)
(reduced set)
Drops lowest n% (default:33%) of scdet scores, since it scores every
frame
Python being what it is, this could be a single line in another method
but pulling it out into another function:
- makes explicit what we are doing and lets us document why
- makes for easier testing
Uses pyloudnorm under the hood to determine the loudness of the supplied
media file (handles videos transparently)
TBC: some sort of limiter on the number produced
BREAKING CHANGE: source now refers to a Source object, the FE that
created the Feature is now referred to by feature_extractor; path is
dropped
This should be more consistent, plus we needed a reference to the
original Source kept around anyway -- path worked but a Source object is
more consistent and explicit about intent
This adds functionality for getting laughter by using jrgillick's
laughter detection library
NB python expects all of the feature extractor's dependencies to be
available; perhaps in future we can do something even fancier like
activating another python env
[retroactive commit]
This composes an Interval with some ancillary data:
- source (what feature extractor did we get this from)
- path (where is it now on the FS)
- score (for ranking)
TODO double check we aren't overloading the word 'source' to have two
meanings