Contextualized End-to-end Speech Recognition With Contextual Phrase Prediction Network
2023 Β· Kaixun Huang, Ao Zhang, Zhanheng Yang, et al.
Abstract
Contextual information plays a crucial role in speech recognition technologies and incorporating it into the end-to-end speech recognition models has drawn immense interest recently. However, previous deep bias methods lacked explicit supervision for bias tasks. In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method. This network predicts context phrases in utterances using contextual embeddings and calculates bias loss to assist in the training of the contextualized model. Our method achieved a significant word error rate (WER) reduction across various end-to-end speech recognition models. Experiments on the LibriSpeech corpus show that our proposed model obtains a 12.1% relative WER improvement over the baseline model, and the WER of the context phrases decreases relatively by 40.5%. Moreover, by applying a context phrase filtering strategy, we also effectively eliminate the WER degradation when using a larger biasing list.
Authors
(none)
Tags
Stats
Related papers
- Contextualized End-to-end Automatic Speech Recognition With Intermediate Biasing Loss (2024)5.84
- Contextualized Streaming End-to-end Speech Recognition With Trie-based Deep Biasing And Shallow Fusion (2021)13.44
- Adaptive Contextual Biasing For Transducer Based Streaming Speech Recognition (2023)7.16
- Improving End-to-end Contextual Speech Recognition With Fine-grained Contextual Knowledge Selection (2022)10.74
- Contextualized Automatic Speech Recognition With Attention-based Bias Phrase Boosted Beam Search (2024)8.60
- Improving Neural Biasing For Contextual Speech Recognition By Early Context Injection And Text Perturbation (2024)8.09
- Towards Contextual Spelling Correction For Customization Of End-to-end Speech Recognition Systems (2022)9.92
- Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition (2023)8.60