Improving Synthetic Data Training For Contextual Biasing Models With A Keyword-aware Cost Function
2025 Β· Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng
Abstract
Rare word recognition can be improved by adapting ASR models to synthetic data that includes these words. Further improvements can be achieved through contextual biasing, which trains and adds a biasing module into the model architecture to prioritize rare words. While training the module on synthetic rare word data is more effective than using non-rare-word data, it can lead to overfitting due to artifacts in the synthetic audio. To address this, we enhance the TCPGen-based contextual biasing approach and propose a keyword-aware loss function that additionally focuses on biased words when training biasing modules. This loss includes a masked cross-entropy term for biased word prediction and a binary classification term for detecting biased word positions. These two terms complementarily support the decoding of biased words during inference. By adapting Whisper to 10 hours of synthetic data, our method reduced the word error rate on the NSC Part 2 test set from 29.71% to 11.81%.
Authors
(none)
Tags
Stats
Related papers
- Improving Contextual Recognition Of Rare Words With An Alternate Spelling Prediction Model (2022)7.81
- Contextual Biasing To Improve Domain-specific Custom Vocabulary Audio Transcription Without Explicit Fine-tuning Of Whisper Model (2024)4.52
- Improving Neural Biasing For Contextual Speech Recognition By Early Context Injection And Text Perturbation (2024)8.09
- Minimising Biasing Word Errors For Contextual ASR With The Tree-constrained Pointer Generator (2022)6.77
- Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition (2023)8.60
- Adaptive Contextual Biasing For Transducer Based Streaming Speech Recognition (2023)7.16
- Contextualized End-to-end Automatic Speech Recognition With Intermediate Biasing Loss (2024)5.84
- Fast Context-biasing For CTC And Transducer ASR Models With Ctc-based Word Spotter (2024)2.26