Browse Source

[meeting 2] add pipeline planning doc

main
Rob Hallam 3 months ago
parent
commit
6d69f043a0
2 changed files with 156 additions and 0 deletions
  1. +148
    -0
      highlight-pipeline-planning.org
  2. +8
    -0
      meetings.org

+ 148
- 0
highlight-pipeline-planning.org View File

@@ -0,0 +1,148 @@
* Highlight Generator Pipeline Planning

** Overview

[[file:~/downloads/highlightgeneration-process.svg]]

** Pipeline `API'

*** Input

User-driven selection of input videos.

- files :: /user-selected/ list of ≥1 input files to be processed
- (optional) time restriction :: start and end time (or: start time + duration?) of file to restrict highlight generation to (format: s & H:M:s ?) (see note)
- (optional, /stretch/) feature extractor mapping :: a map of files to feature exactors, eg:

#+BEGIN_SRC yaml
video1:
path: /video/directory/video1.mkv
feature_extractors:
- laughter-detection
- loud-moments

video2:
path: /video/directory/video2.mp4
start: 10:00
end: 50:00
feature_extractors:
- word-recognition
#+END_SRC

**** Time Restriction

To properly operate on a restricted range, this can create a temporary media file (using eg ffmpeg) for operation on by the *Feature Selection* step.

Creating a temporary file can be avoided by ensuring each /Feature Selector/ respects a custom duration, but since some are third-party they may need their implementation updated.

*Discussion point*: pros and cons of updating 3P feature selectors

**** Output

Conceptually, a list of files (either original path, or path to temporary time-restricted version) and associated options for each. This will either be a language-specific object, or the equivelent JSON.

Example:

#+BEGIN_SRC json
{
source_videos: [
{ "path": "/video/director/video1.mkv",
"feature_extractors": [
"laughter-detection",
"loud-moments",
],
},
{ "path": "/tmp/videohighlights/inputclip0001.mkv",
"feature_extractors": [
"word-recognition"
],
}
]
}
#+END_SRC

**** Further Considerations

- time specification formats -- start & end ? start & duration? either? negative times for specifying time distance from /end/ of file?

*** Source / Feature Selection

A ~Source~ is an automation-driven /method/ of figuring out what bits of an *input video* to keep.

**** Input

A ist of input videos as in *Input*

**** Options

- ~Source~-specific /options/ (eg min duration, threshold etc for laughter-detection),
- /working directory/
- minimum duration (see /Further Considerations/)

**** Output

A set of ≥0 /Feature/-type objects or equivalent JSON.

**** Further Considerations

At time of writing the feature selection drivers conceptually output timestamps, as opposed to durations; but conceptually durations make more sense. It may be worthwhile to automatically promote any `point' timestamps to a duration.

Pros: makes the next step in the pipeline more uniform
Cons: will probably over-sample

Consequent consideration: does that mean we should let the user adjust the /pre-consolidation/ times too? Probably, but doing that it a UX-friendly way will potentially take some doing.

*** Consolidation

The consolidation stage takes a list of timestamps (across videos) and /consolidates/ / merges proximal times to create *clip definitions* across *sources*.

- input :: a list of *video files* with associated *timestamps* or *time ranges* (and their sources), eg in JSON:

#+BEGIN_SRC json
{ "videos": [ "/path/to/videos/video1.mp4": [ { "time": 180, "source": "laughter-detect" },
{ "time": 187, "source": "laughter-detect" },
{ "time": 295, "source": "loud-detect" },
{ "time": 332, "source": "laughter-detect" }
],
"/path/to/videos/video2.mp4": [ { "start": 45, "end": 130, "source": "segmenter" } ],
]
}
#+END_SRC

**** Approach

The input list of feature times goes through comparison process: if a feature has /overlap/ with another (that is, starts or ends within the time period of another feature), those two features are consolidated into one. This comparison can be done with a /delta/, a small amount of time (eg 15s) for which `nearby' intervals can be considered overlapping.


**** Options

- maximum delta between clips to be consolidated (default: 15s [rationale: Stetson-Harrison approach[fn:stetson])
- maximum duration of a consolidated clip (default: 60s [rationale: max duration of YT shorts?])
- maximum number of consolidated clips to output (default: unlimited)


[fn:stetson] The [[https://english.stackexchange.com/a/426717][Stetson-Harrison]] approach

*** Refinement

User-driven process of /selection/ and applying /operators/ to clips before final output.

**** 1. Selection

User choice of which clips to keep.

**** 2. Process

User applies (video) ~Processes~ to the clip(s):

- duration :: select start and end time (possibly before/after generated clip's boundaries)
- join :: further join clips which were not joined at consolidation stage
- filters :: eg sharpen / slomo / (de)saturate etc [stretch]
- split ::

*Note*: need to be careful not to reimplement an NLE here!

*** Highlights

Ultimate output.


+ 8
- 0
meetings.org View File

@@ -132,3 +132,11 @@ Next steps for coming week:

[[https://roberthallam.com/files/highlights-infographic-process.png]]

*** Pipeline Planning

[[file:highlight-pipleline2.svg]]

[[https://roberthallam.com/files/highlight-pipeline2.svg]]

see [[file:highlight-pipeline-planning.org][highlight-pipeline-planning.org]]


Loading…
Cancel
Save