Teasergen: Generating Teasers For Long Documentaries
2024 Β· Weihan Xu, Paul Pu Liang, Haven Kim, et al.
Abstract
Teasers are an effective tool for promoting content in entertainment, commercial and educational fields. However, creating an effective teaser for long videos is challenging for it requires long-range multimodal modeling on the input videos, while necessitating maintaining audiovisual alignments, managing scene changes and preserving factual accuracy for the output teasers. Due to the lack of a publicly-available dataset, progress along this research direction has been hindered. In this work, we present DocumentaryNet, a collection of 1,269 documentaries paired with their teasers, featuring multimodal data streams of video, speech, music, sound effects and narrations. With DocumentaryNet, we propose a new two-stage system for generating teasers from long documentaries. The proposed TeaserGen system first generates the teaser narration from the transcribed narration of the documentary using a pretrained large language model, and then selects the most relevant visual content to accompany
Authors
(none)
Tags
Stats
Related papers
- Mmtrail: A Multimodal Trailer Video Dataset With Language And Music Descriptions (2024)0.00
- Mm-narrator: Narrating Long-form Videos With Multimodal In-context Learning (2023)10.35
- Sound-vecaps: Improving Audio Generation With Visual Enhanced Captions (2024)7.16
- Deepsound-v1: Start To Think Step-by-step In The Audio Generation From Videos (2025)0.00
- Effectively Obtaining Acoustic, Visual And Textual Data From Videos (2025)0.00
- Diverse And Aligned Audio-to-video Generation Via Text-to-video Model Adaptation (2023)11.19
- Talkverse: Democratizing Minute-long Audio-driven Video Generation (2025)0.00
- A Better Use Of Audio-visual Cues: Dense Video Captioning With Bi-modal Transformer (2020)10.61