Whisper-lm: Improving ASR Models With Language Models For Low-resource Languages
2025 · Xabier de Zuazo, Eva Navas, Ibon Saratxaga, et al.
Abstract
Automatic speech recognition systems have undoubtedly advanced with the integration of multilingual and multitask models such as Whisper, which have shown a promising ability to understand and process speech across a wide range of languages. Despite their robustness, these models often fall short in handling the linguistic distinctions of minority languages. This study addresses this gap by integrating traditional and novel language models with fine-tuned Whisper models to raise their performance in less commonly studied languages. Through rigorous fine-tuning and evaluation across multiple datasets, we demonstrate substantial improvements in word error rate, particularly in low-resource scenarios. Our approach not only does take advantage of the extensive data Whisper was pre-trained on, but also complements its linguistic adaptability by incorporating language models. We obtained improvements up to 51% for in-distribution datasets and up to 34% for out-of-distribution sentences using
Authors
(none)
Tags
Stats
Related papers
- Multilingual Distilwhisper: Efficient Distillation Of Multi-task Speech Models Via Language-specific Experts (2023)8.09
- Weighted Cross-entropy For Low-resource Languages In Multilingual Speech Recognition (2024)6.34
- M2r-whisper: Multi-stage And Multi-scale Retrieval Augmentation For Enhancing Whisper (2024)6.77
- Whisper Turns Stronger: Augmenting Wav2vec 2.0 For Superior ASR In Low-resource Languages (2024)0.00
- Probing The Hidden Talent Of ASR Foundation Models For L2 English Oral Assessment (2025)0.00
- Fine-tuning Whisper On Low-resource Languages For Real-world Applications (2024)0.00
- Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models With Diverse Speech Variabilities (2024)4.52
- Adapting Whisper For Code-switching Through Encoding Refining And Language-aware Decoding (2024)0.00