|
|
@@ -0,0 +1,148 @@ |
|
|
|
* Highlight Generator Pipeline Planning |
|
|
|
|
|
|
|
** Overview |
|
|
|
|
|
|
|
[[file:~/downloads/highlightgeneration-process.svg]] |
|
|
|
|
|
|
|
** Pipeline `API' |
|
|
|
|
|
|
|
*** Input |
|
|
|
|
|
|
|
User-driven selection of input videos. |
|
|
|
|
|
|
|
- files :: /user-selected/ list of ≥1 input files to be processed |
|
|
|
- (optional) time restriction :: start and end time (or: start time + duration?) of file to restrict highlight generation to (format: s & H:M:s ?) (see note) |
|
|
|
- (optional, /stretch/) feature extractor mapping :: a map of files to feature exactors, eg: |
|
|
|
|
|
|
|
#+BEGIN_SRC yaml |
|
|
|
video1: |
|
|
|
path: /video/directory/video1.mkv |
|
|
|
feature_extractors: |
|
|
|
- laughter-detection |
|
|
|
- loud-moments |
|
|
|
|
|
|
|
video2: |
|
|
|
path: /video/directory/video2.mp4 |
|
|
|
start: 10:00 |
|
|
|
end: 50:00 |
|
|
|
feature_extractors: |
|
|
|
- word-recognition |
|
|
|
#+END_SRC |
|
|
|
|
|
|
|
**** Time Restriction |
|
|
|
|
|
|
|
To properly operate on a restricted range, this can create a temporary media file (using eg ffmpeg) for operation on by the *Feature Selection* step. |
|
|
|
|
|
|
|
Creating a temporary file can be avoided by ensuring each /Feature Selector/ respects a custom duration, but since some are third-party they may need their implementation updated. |
|
|
|
|
|
|
|
*Discussion point*: pros and cons of updating 3P feature selectors |
|
|
|
|
|
|
|
**** Output |
|
|
|
|
|
|
|
Conceptually, a list of files (either original path, or path to temporary time-restricted version) and associated options for each. This will either be a language-specific object, or the equivelent JSON. |
|
|
|
|
|
|
|
Example: |
|
|
|
|
|
|
|
#+BEGIN_SRC json |
|
|
|
{ |
|
|
|
source_videos: [ |
|
|
|
{ "path": "/video/director/video1.mkv", |
|
|
|
"feature_extractors": [ |
|
|
|
"laughter-detection", |
|
|
|
"loud-moments", |
|
|
|
], |
|
|
|
}, |
|
|
|
{ "path": "/tmp/videohighlights/inputclip0001.mkv", |
|
|
|
"feature_extractors": [ |
|
|
|
"word-recognition" |
|
|
|
], |
|
|
|
} |
|
|
|
] |
|
|
|
} |
|
|
|
#+END_SRC |
|
|
|
|
|
|
|
**** Further Considerations |
|
|
|
|
|
|
|
- time specification formats -- start & end ? start & duration? either? negative times for specifying time distance from /end/ of file? |
|
|
|
|
|
|
|
*** Source / Feature Selection |
|
|
|
|
|
|
|
A ~Source~ is an automation-driven /method/ of figuring out what bits of an *input video* to keep. |
|
|
|
|
|
|
|
**** Input |
|
|
|
|
|
|
|
A ist of input videos as in *Input* |
|
|
|
|
|
|
|
**** Options |
|
|
|
|
|
|
|
- ~Source~-specific /options/ (eg min duration, threshold etc for laughter-detection), |
|
|
|
- /working directory/ |
|
|
|
- minimum duration (see /Further Considerations/) |
|
|
|
|
|
|
|
**** Output |
|
|
|
|
|
|
|
A set of ≥0 /Feature/-type objects or equivalent JSON. |
|
|
|
|
|
|
|
**** Further Considerations |
|
|
|
|
|
|
|
At time of writing the feature selection drivers conceptually output timestamps, as opposed to durations; but conceptually durations make more sense. It may be worthwhile to automatically promote any `point' timestamps to a duration. |
|
|
|
|
|
|
|
Pros: makes the next step in the pipeline more uniform |
|
|
|
Cons: will probably over-sample |
|
|
|
|
|
|
|
Consequent consideration: does that mean we should let the user adjust the /pre-consolidation/ times too? Probably, but doing that it a UX-friendly way will potentially take some doing. |
|
|
|
|
|
|
|
*** Consolidation |
|
|
|
|
|
|
|
The consolidation stage takes a list of timestamps (across videos) and /consolidates/ / merges proximal times to create *clip definitions* across *sources*. |
|
|
|
|
|
|
|
- input :: a list of *video files* with associated *timestamps* or *time ranges* (and their sources), eg in JSON: |
|
|
|
|
|
|
|
#+BEGIN_SRC json |
|
|
|
{ "videos": [ "/path/to/videos/video1.mp4": [ { "time": 180, "source": "laughter-detect" }, |
|
|
|
{ "time": 187, "source": "laughter-detect" }, |
|
|
|
{ "time": 295, "source": "loud-detect" }, |
|
|
|
{ "time": 332, "source": "laughter-detect" } |
|
|
|
], |
|
|
|
"/path/to/videos/video2.mp4": [ { "start": 45, "end": 130, "source": "segmenter" } ], |
|
|
|
] |
|
|
|
} |
|
|
|
#+END_SRC |
|
|
|
|
|
|
|
**** Approach |
|
|
|
|
|
|
|
The input list of feature times goes through comparison process: if a feature has /overlap/ with another (that is, starts or ends within the time period of another feature), those two features are consolidated into one. This comparison can be done with a /delta/, a small amount of time (eg 15s) for which `nearby' intervals can be considered overlapping. |
|
|
|
|
|
|
|
|
|
|
|
**** Options |
|
|
|
|
|
|
|
- maximum delta between clips to be consolidated (default: 15s [rationale: Stetson-Harrison approach[fn:stetson]) |
|
|
|
- maximum duration of a consolidated clip (default: 60s [rationale: max duration of YT shorts?]) |
|
|
|
- maximum number of consolidated clips to output (default: unlimited) |
|
|
|
|
|
|
|
|
|
|
|
[fn:stetson] The [[https://english.stackexchange.com/a/426717][Stetson-Harrison]] approach |
|
|
|
|
|
|
|
*** Refinement |
|
|
|
|
|
|
|
User-driven process of /selection/ and applying /operators/ to clips before final output. |
|
|
|
|
|
|
|
**** 1. Selection |
|
|
|
|
|
|
|
User choice of which clips to keep. |
|
|
|
|
|
|
|
**** 2. Process |
|
|
|
|
|
|
|
User applies (video) ~Processes~ to the clip(s): |
|
|
|
|
|
|
|
- duration :: select start and end time (possibly before/after generated clip's boundaries) |
|
|
|
- join :: further join clips which were not joined at consolidation stage |
|
|
|
- filters :: eg sharpen / slomo / (de)saturate etc [stretch] |
|
|
|
- split :: |
|
|
|
|
|
|
|
*Note*: need to be careful not to reimplement an NLE here! |
|
|
|
|
|
|
|
*** Highlights |
|
|
|
|
|
|
|
Ultimate output. |
|
|
|
|