Exploring Spoken Language Identification Strategies For Automatic Transcription Of Multilingual Broadcast And Institutional Speech
2024 Β· Martina Valente, Fabio Brugnara, Giovanni Morrone, et al.
Abstract
This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these domains language changes are mostly associated with speaker changes, we propose a cascaded system consisting of speaker diarization and language identification and compare it with more traditional language identification and language diarization systems. Results show that the proposed system often achieves lower language classification and language diarization error rates (up to 10% relative language diarization error reduction and 60% relative language confusion reduction) and leads to lower WERs on multilingual test sets (more than 8% relative WER reduction), while at the same time does not negatively affect speech recognition on monolingual audio (with an absolute WER increase between 0.1% and 0.7% w.r.t. monolingual ASR).
Authors
(none)
Tags
Stats
Related papers
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Scdiar: A Streaming Diarization System Based On Speaker Change Detection And Speech Recognition (2025)2.26
- Integration Of Speech Separation, Diarization, And Recognition For Multi-speaker Meetings: System Description, Comparison, And Analysis (2020)13.23
- One Model To Rule Them All ? Towards End-to-end Joint Speaker Diarization And Speech Recognition (2023)9.59
- Streaming End-to-end Bilingual ASR Systems With Joint Language Identification (2020)0.00
- Cross-domain Adaptation Of Spoken Language Identification For Related Languages: The Curious Case Of Slavic Languages (2020)8.35
- Intent Recognition And Unsupervised Slot Identification For Low Resourced Spoken Dialog Systems (2021)2.26
- Once More Diarization: Improving Meeting Transcription Systems Through Segment-level Speaker Reassignment (2024)5.24