Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information
2023 · Dejan Porjazovski, Tamás Grósz, Mikko Kurimo
Abstract
Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient data to train both components of the pipeline. However, in low-resource situations, the ASR system, even if available, produces low-quality transcripts, leading to a bad text-based classifier. Moreover, spontaneous speech containing hesitations can further degrade the performance of the ASR model. In this paper, we investigate alternatives to the standard text-only solutions by comparing audio-only and hybrid techniques of jointly utilising text and audio features. The models evaluated on spontaneous Finnish speech demonstrate that purely audio-based solutions are a viable option when ASR components are not available, while the hybrid multi-modal solutions achieve the best results.
Authors
(none)
Tags
Stats
Related papers
- Low-resource Contextual Topic Identification On Speech (2018)2.26
- Audio-based Linguistic Feature Extraction For Enhancing Multi-lingual And Low-resource Text-to-speech (2024)0.00
- Reading Between The Waves: Robust Topic Segmentation Using Inter-sentence Audio Features (2026)0.00
- Topic Identification For Speech Without ASR (2017)7.16
- Multimodal Audio-textual Architecture For Robust Spoken Language Understanding (2023)0.00
- Disentangling Speech And Non-speech Components For Building Robust Acoustic Models From Found Data (2019)0.00
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Leveraging Acoustic Contextual Representation By Audio-textual Cross-modal Learning For Conversational ASR (2022)0.00