Role Of Audio In Audio-visual Video Summarization
2022 · Ibrahim Shoer, Berkay Kopru, Engin Erzin
Abstract
Video summarization attracts attention for efficient video representation, retrieval, and browsing to ease volume and traffic surge problems. Although video summarization mostly uses the visual channel for compaction, the benefits of audio-visual modeling appeared in recent literature. The information coming from the audio channel can be a result of audio-visual correlation in the video content. In this study, we propose a new audio-visual video summarization framework integrating four ways of audio-visual information fusion with GRU-based and attention-based networks. Furthermore, we investigate a new explainability methodology using audio-visual canonical correlation analysis (CCA) to better understand and explain the role of audio in the video summarization task. Experimental evaluations on the TVSum dataset attain F1 score and Kendall-tau score improvements for the audio-visual video summarization. Furthermore, splitting video content on TVSum and COGNIMUSE datasets based on audio-
Authors
(none)
Tags
Stats
Related papers
- Realizing Video Summarization From The Path Of Language-based Semantic Understanding (2024)0.00
- Audio Summarization With Audio Features And Probability Distribution Divergence (2020)0.00
- A Cascaded Architecture For Extractive Summarization Of Multimedia Content Via Audio-to-text Alignment (2025)0.00
- Multimodal Frame-scoring Transformer For Video Summarization (2022)0.00
- Hierarchical Multimodal Transformer To Summarize Videos (2021)14.69
- Vt-ssum: A Benchmark Dataset For Video Transcript Segmentation And Summarization (2021)2.76
- Attentive Fusion Enhanced Audio-visual Encoding For Transformer Based Robust Speech Recognition (2020)0.00
- Audio Visual Segmentation Through Text Embeddings (2025)1.81