Vt-ssum: A Benchmark Dataset For Video Transcript Segmentation And Summarization
2021 Β· Tengchao Lv, Lei Cui, Momcilo Vasilijevic, et al.
Abstract
Video transcript summarization is a fundamental task for video understanding. Conventional approaches for transcript summarization are usually built upon the summarization data for written language such as news articles, while the domain discrepancy may degrade the model performance on spoken text. In this paper, we present VT-SSum, a benchmark dataset with spoken language for video transcript segmentation and summarization, which includes 125K transcript-summary pairs from 9,616 videos. VT-SSum takes advantage of the videos from VideoLectures.NET by leveraging the slides content as the weak supervision to generate the extractive summary for video transcripts. Experiments with a state-of-the-art deep learning approach show that the model trained with VT-SSum brings a significant improvement on the AMI spoken text summarization benchmark. VT-SSum is publicly available at https://github.com/Dod-o/VT-SSum to support the future research of video transcript segmentation and summarization ta
Authors
(none)
Tags
Stats
Code
Related papers
- Toward Unifying Text Segmentation And Long Document Summarization (2022)8.60
- Multimodal Frame-scoring Transformer For Video Summarization (2022)0.00
- Realizing Video Summarization From The Path Of Language-based Semantic Understanding (2024)0.00
- VAST: A Vision-audio-subtitle-text Omni-modality Foundation Model And Dataset (2023)14.55
- Team MTS @ Automin 2021: An Overview Of Existing Summarization Approaches And Comparison To Unsupervised Summarization Techniques (2024)0.00
- Role Of Audio In Audio-visual Video Summarization (2022)0.00
- More Than Words: In-the-wild Visually-driven Prosody For Text-to-speech (2021)9.03
- Sentence-wise Speech Summarization: Task, Datasets, And End-to-end Modeling With LM Knowledge Distillation (2024)5.84