Contextualized Automatic Speech Recognition With Attention-based Bias Phrase Boosted Beam Search
2024 Β· Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, et al.
Abstract
End-to-end (E2E) automatic speech recognition (ASR) methods exhibit remarkable performance. However, since the performance of such methods is intrinsically linked to the context present in the training data, E2E-ASR methods do not perform as desired for unseen user contexts (e.g., technical terms, personal names, and playlists). Thus, E2E-ASR methods must be easily contextualized by the user or developer. This paper proposes an attention-based contextual biasing method that can be customized using an editable phrase list (referred to as a bias list). The proposed method can be trained effectively by combining a bias phrase index loss and special tokens to detect the bias phrases in the input speech data. In addition, to improve the contextualization performance during inference further, we propose a bias phrase boosted (BPB) beam search algorithm based on the bias phrase index probability. Experimental results demonstrate that the proposed method consistently improves the word error ra
Authors
(none)
Tags
Stats
Related papers
- Locality Enhanced Dynamic Biasing And Sampling Strategies For Contextual ASR (2024)0.00
- Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition (2023)8.60
- Contextualized End-to-end Automatic Speech Recognition With Intermediate Biasing Loss (2024)5.84
- End-to-end Contextual Asr Based On Posterior Distribution Adaptation For Hybrid Ctc/attention System (2022)0.00
- Contextualized End-to-end Speech Recognition With Contextual Phrase Prediction Network (2023)10.48
- Towards Contextual Spelling Correction For Customization Of End-to-end Speech Recognition Systems (2022)9.92
- Adaptive Contextual Biasing For Transducer Based Streaming Speech Recognition (2023)7.16
- Improving Neural Biasing For Contextual Speech Recognition By Early Context Injection And Text Perturbation (2024)8.09