Nevar pievienot vairāk kā 25 tēmas Tēmai ir jāsākas ar burtu vai ciparu, tā var saturēt domu zīmes ('-') un var būt līdz 35 simboliem gara.

highlight-pipeline-planning.org 5.0 KiB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148
  1. * Highlight Generator Pipeline Planning
  2. ** Overview
  3. [[file:~/downloads/highlightgeneration-process.svg]]
  4. ** Pipeline `API'
  5. *** Input
  6. User-driven selection of input videos.
  7. - files :: /user-selected/ list of ≥1 input files to be processed
  8. - (optional) time restriction :: start and end time (or: start time + duration?) of file to restrict highlight generation to (format: s & H:M:s ?) (see note)
  9. - (optional, /stretch/) feature extractor mapping :: a map of files to feature exactors, eg:
  10. #+BEGIN_SRC yaml
  11. video1:
  12. path: /video/directory/video1.mkv
  13. feature_extractors:
  14. - laughter-detection
  15. - loud-moments
  16. video2:
  17. path: /video/directory/video2.mp4
  18. start: 10:00
  19. end: 50:00
  20. feature_extractors:
  21. - word-recognition
  22. #+END_SRC
  23. **** Time Restriction
  24. To properly operate on a restricted range, this can create a temporary media file (using eg ffmpeg) for operation on by the *Feature Selection* step.
  25. Creating a temporary file can be avoided by ensuring each /Feature Selector/ respects a custom duration, but since some are third-party they may need their implementation updated.
  26. *Discussion point*: pros and cons of updating 3P feature selectors
  27. **** Output
  28. Conceptually, a list of files (either original path, or path to temporary time-restricted version) and associated options for each. This will either be a language-specific object, or the equivelent JSON.
  29. Example:
  30. #+BEGIN_SRC json
  31. {
  32. source_videos: [
  33. { "path": "/video/director/video1.mkv",
  34. "feature_extractors": [
  35. "laughter-detection",
  36. "loud-moments",
  37. ],
  38. },
  39. { "path": "/tmp/videohighlights/inputclip0001.mkv",
  40. "feature_extractors": [
  41. "word-recognition"
  42. ],
  43. }
  44. ]
  45. }
  46. #+END_SRC
  47. **** Further Considerations
  48. - time specification formats -- start & end ? start & duration? either? negative times for specifying time distance from /end/ of file?
  49. *** Source / Feature Selection
  50. A ~Source~ is an automation-driven /method/ of figuring out what bits of an *input video* to keep.
  51. **** Input
  52. A ist of input videos as in *Input*
  53. **** Options
  54. - ~Source~-specific /options/ (eg min duration, threshold etc for laughter-detection),
  55. - /working directory/
  56. - minimum duration (see /Further Considerations/)
  57. **** Output
  58. A set of ≥0 /Feature/-type objects or equivalent JSON.
  59. **** Further Considerations
  60. At time of writing the feature selection drivers conceptually output timestamps, as opposed to durations; but conceptually durations make more sense. It may be worthwhile to automatically promote any `point' timestamps to a duration.
  61. Pros: makes the next step in the pipeline more uniform
  62. Cons: will probably over-sample
  63. Consequent consideration: does that mean we should let the user adjust the /pre-consolidation/ times too? Probably, but doing that it a UX-friendly way will potentially take some doing.
  64. *** Consolidation
  65. The consolidation stage takes a list of timestamps (across videos) and /consolidates/ / merges proximal times to create *clip definitions* across *sources*.
  66. - input :: a list of *video files* with associated *timestamps* or *time ranges* (and their sources), eg in JSON:
  67. #+BEGIN_SRC json
  68. { "videos": [ "/path/to/videos/video1.mp4": [ { "time": 180, "source": "laughter-detect" },
  69. { "time": 187, "source": "laughter-detect" },
  70. { "time": 295, "source": "loud-detect" },
  71. { "time": 332, "source": "laughter-detect" }
  72. ],
  73. "/path/to/videos/video2.mp4": [ { "start": 45, "end": 130, "source": "segmenter" } ],
  74. ]
  75. }
  76. #+END_SRC
  77. **** Approach
  78. The input list of feature times goes through comparison process: if a feature has /overlap/ with another (that is, starts or ends within the time period of another feature), those two features are consolidated into one. This comparison can be done with a /delta/, a small amount of time (eg 15s) for which `nearby' intervals can be considered overlapping.
  79. **** Options
  80. - maximum delta between clips to be consolidated (default: 15s [rationale: Stetson-Harrison approach[fn:stetson])
  81. - maximum duration of a consolidated clip (default: 60s [rationale: max duration of YT shorts?])
  82. - maximum number of consolidated clips to output (default: unlimited)
  83. [fn:stetson] The [[https://english.stackexchange.com/a/426717][Stetson-Harrison]] approach
  84. *** Refinement
  85. User-driven process of /selection/ and applying /operators/ to clips before final output.
  86. **** 1. Selection
  87. User choice of which clips to keep.
  88. **** 2. Process
  89. User applies (video) ~Processes~ to the clip(s):
  90. - duration :: select start and end time (possibly before/after generated clip's boundaries)
  91. - join :: further join clips which were not joined at consolidation stage
  92. - filters :: eg sharpen / slomo / (de)saturate etc [stretch]
  93. - split ::
  94. *Note*: need to be careful not to reimplement an NLE here!
  95. *** Highlights
  96. Ultimate output.