diff --git a/highlight-pipeline-planning.org b/highlight-pipeline-planning.org new file mode 100644 index 0000000..1a72c71 --- /dev/null +++ b/highlight-pipeline-planning.org @@ -0,0 +1,148 @@ +* Highlight Generator Pipeline Planning + +** Overview + +[[file:~/downloads/highlightgeneration-process.svg]] + +** Pipeline `API' + +*** Input + +User-driven selection of input videos. + + - files :: /user-selected/ list of ≥1 input files to be processed + - (optional) time restriction :: start and end time (or: start time + duration?) of file to restrict highlight generation to (format: s & H:M:s ?) (see note) + - (optional, /stretch/) feature extractor mapping :: a map of files to feature exactors, eg: + +#+BEGIN_SRC yaml +video1: + path: /video/directory/video1.mkv + feature_extractors: + - laughter-detection + - loud-moments + +video2: + path: /video/directory/video2.mp4 + start: 10:00 + end: 50:00 + feature_extractors: + - word-recognition +#+END_SRC + +**** Time Restriction + +To properly operate on a restricted range, this can create a temporary media file (using eg ffmpeg) for operation on by the *Feature Selection* step. + +Creating a temporary file can be avoided by ensuring each /Feature Selector/ respects a custom duration, but since some are third-party they may need their implementation updated. + +*Discussion point*: pros and cons of updating 3P feature selectors + +**** Output + +Conceptually, a list of files (either original path, or path to temporary time-restricted version) and associated options for each. This will either be a language-specific object, or the equivelent JSON. + +Example: + +#+BEGIN_SRC json +{ + source_videos: [ + { "path": "/video/director/video1.mkv", + "feature_extractors": [ + "laughter-detection", + "loud-moments", + ], + }, + { "path": "/tmp/videohighlights/inputclip0001.mkv", + "feature_extractors": [ + "word-recognition" + ], + } + ] +} +#+END_SRC + +**** Further Considerations + + - time specification formats -- start & end ? start & duration? either? negative times for specifying time distance from /end/ of file? + +*** Source / Feature Selection + +A ~Source~ is an automation-driven /method/ of figuring out what bits of an *input video* to keep. + +**** Input + +A ist of input videos as in *Input* + +**** Options + + - ~Source~-specific /options/ (eg min duration, threshold etc for laughter-detection), + - /working directory/ + - minimum duration (see /Further Considerations/) + +**** Output + +A set of ≥0 /Feature/-type objects or equivalent JSON. + +**** Further Considerations + +At time of writing the feature selection drivers conceptually output timestamps, as opposed to durations; but conceptually durations make more sense. It may be worthwhile to automatically promote any `point' timestamps to a duration. + +Pros: makes the next step in the pipeline more uniform +Cons: will probably over-sample + +Consequent consideration: does that mean we should let the user adjust the /pre-consolidation/ times too? Probably, but doing that it a UX-friendly way will potentially take some doing. + +*** Consolidation + +The consolidation stage takes a list of timestamps (across videos) and /consolidates/ / merges proximal times to create *clip definitions* across *sources*. + + - input :: a list of *video files* with associated *timestamps* or *time ranges* (and their sources), eg in JSON: + + #+BEGIN_SRC json + { "videos": [ "/path/to/videos/video1.mp4": [ { "time": 180, "source": "laughter-detect" }, + { "time": 187, "source": "laughter-detect" }, + { "time": 295, "source": "loud-detect" }, + { "time": 332, "source": "laughter-detect" } + ], + "/path/to/videos/video2.mp4": [ { "start": 45, "end": 130, "source": "segmenter" } ], + ] + } + #+END_SRC + +**** Approach + +The input list of feature times goes through comparison process: if a feature has /overlap/ with another (that is, starts or ends within the time period of another feature), those two features are consolidated into one. This comparison can be done with a /delta/, a small amount of time (eg 15s) for which `nearby' intervals can be considered overlapping. + + +**** Options + + - maximum delta between clips to be consolidated (default: 15s [rationale: Stetson-Harrison approach[fn:stetson]) + - maximum duration of a consolidated clip (default: 60s [rationale: max duration of YT shorts?]) + - maximum number of consolidated clips to output (default: unlimited) + + +[fn:stetson] The [[https://english.stackexchange.com/a/426717][Stetson-Harrison]] approach + +*** Refinement + +User-driven process of /selection/ and applying /operators/ to clips before final output. + +**** 1. Selection + +User choice of which clips to keep. + +**** 2. Process + +User applies (video) ~Processes~ to the clip(s): + + - duration :: select start and end time (possibly before/after generated clip's boundaries) + - join :: further join clips which were not joined at consolidation stage + - filters :: eg sharpen / slomo / (de)saturate etc [stretch] + - split :: + +*Note*: need to be careful not to reimplement an NLE here! + +*** Highlights + +Ultimate output. + diff --git a/meetings.org b/meetings.org index 8524387..a90cf25 100644 --- a/meetings.org +++ b/meetings.org @@ -132,3 +132,11 @@ Next steps for coming week: [[https://roberthallam.com/files/highlights-infographic-process.png]] +*** Pipeline Planning + +[[file:highlight-pipleline2.svg]] + +[[https://roberthallam.com/files/highlight-pipeline2.svg]] + +see [[file:highlight-pipeline-planning.org][highlight-pipeline-planning.org]] +