Augsumm: Towards Generalizable Speech Summarization Using Synthetic Labels From Large Language Model
2024 Β· Jee-Weon Jung, Roshan Sharma, William Chen, et al.
Abstract
Abstractive speech summarization (SSUM) aims to generate human-like summaries from speech. Given variations in information captured and phrasing, recordings can be summarized in multiple ways. Therefore, it is more reasonable to consider a probabilistic distribution of all potential summaries rather than a single summary. However, conventional SSUM models are mostly trained and evaluated with a single ground-truth (GT) human-annotated deterministic summary for every recording. Generating multiple human references would be ideal to better represent the distribution statistically, but is impractical because annotation is expensive. We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation. First, we explore prompting strategies to generate synthetic summaries from ChatGPT. We validate the quality of synthetic summaries using multiple metrics including human e
Authors
(none)
Tags
Stats
Related papers
- Prompting Large Language Models With Audio For General-purpose Speech Summarization (2024)6.34
- Transfer Learning From Pre-trained Language Models Improves End-to-end Speech Summarization (2023)6.77
- Sentence-wise Speech Summarization: Task, Datasets, And End-to-end Modeling With LM Knowledge Distillation (2024)5.84
- Leverage Unlabeled Data For Abstractive Speech Summarization With Self-supervised Learning And Back-summarization (2020)2.26
- Realizing Video Summarization From The Path Of Language-based Semantic Understanding (2024)0.00
- Speech Vs. Transcript: Does It Matter For Human Annotators In Speech Summarization? (2024)4.98
- Data Augmentation For Spoken Language Understanding Via Joint Variational Generation (2018)10.61
- Speechllm-as-judges: Towards General And Interpretable Speech Quality Evaluation (2025)2.60