Two Stage Contextual Word Filtering For Context Bias In Unified Streaming And Non-streaming Transducer
2023 Β· Zhanheng Yang, Sining Sun, Xiong Wang, et al.
Abstract
It is difficult for an E2E ASR system to recognize words such as entities appearing infrequently in the training data. A widely used method to mitigate this issue is feeding contextual information into the acoustic model. Previous works have proven that a compact and accurate contextual list can boost the performance significantly. In this paper, we propose an efficient approach to obtain a high quality contextual list for a unified streaming/non-streaming based E2E model. Specifically, we make use of the phone-level streaming output to first filter the predefined contextual word list then fuse it into non-casual encoder and decoder to generate the final recognition results. Our approach improve the accuracy of the contextual ASR system and speed up the inference process. Experiments on two datasets demonstrates over 20% CER reduction comparing to the baseline system. Meanwhile, the RTF of our system can be stabilized within 0.15 when the size of the contextual word list grows over 6,0
Authors
(none)
Tags
Stats
Related papers
- Adaptive Contextual Biasing For Transducer Based Streaming Speech Recognition (2023)7.16
- Contextualized Streaming End-to-end Speech Recognition With Trie-based Deep Biasing And Shallow Fusion (2021)13.44
- Towards Effective And Compact Contextual Representation For Conformer Transducer Speech Recognition Systems (2023)7.16
- Improving RNN-T ASR Accuracy Using Context Audio (2020)5.84
- Improving Contextual Recognition Of Rare Words With An Alternate Spelling Prediction Model (2022)7.81
- Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition (2023)8.60
- Improving Neural Biasing For Contextual Speech Recognition By Early Context Injection And Text Perturbation (2024)8.09
- Cif-based Collaborative Decoding For End-to-end Contextual Speech Recognition (2020)9.76