M2r-whisper: Multi-stage And Multi-scale Retrieval Augmentation For Enhancing Whisper
2024 Β· Jiaming Zhou, Shiwan Zhao, Jiabei He, et al.
Abstract
State-of-the-art models like OpenAI's Whisper exhibit strong performance in multilingual automatic speech recognition (ASR), but they still face challenges in accurately recognizing diverse subdialects. In this paper, we propose M2R-whisper, a novel multi-stage and multi-scale retrieval augmentation approach designed to enhance ASR performance in low-resource settings. Building on the principles of in-context learning (ICL) and retrieval-augmented techniques, our method employs sentence-level ICL in the pre-processing stage to harness contextual information, while integrating token-level k-Nearest Neighbors (kNN) retrieval as a post-processing step to further refine the final output distribution. By synergistically combining sentence-level and token-level retrieval strategies, M2R-whisper effectively mitigates various types of recognition errors. Experiments conducted on Mandarin and subdialect datasets, including AISHELL-1 and KeSpeech, demonstrate substantial improvements in ASR accu
Authors
(none)
Tags
Stats
Related papers
- Multilingual Distilwhisper: Efficient Distillation Of Multi-task Speech Models Via Language-specific Experts (2023)8.09
- Whisper-lm: Improving ASR Models With Language Models For Low-resource Languages (2025)3.29
- A Multitask Training Approach To Enhance Whisper With Contextual Biasing And Open-vocabulary Keyword Spotting (2023)0.00
- Whisper Turns Stronger: Augmenting Wav2vec 2.0 For Superior ASR In Low-resource Languages (2024)0.00
- Probing The Hidden Talent Of ASR Foundation Models For L2 English Oral Assessment (2025)0.00
- Adapting Whisper For Code-switching Through Encoding Refining And Language-aware Decoding (2024)0.00
- Simul-whisper: Attention-guided Streaming Whisper With Truncation Detection (2024)6.34
- Enhancing Indonesian Automatic Speech Recognition: Evaluating Multilingual Models With Diverse Speech Variabilities (2024)4.52