Improving Contextual Recognition Of Rare Words With An Alternate Spelling Prediction Model
2022 Β· Jennifer Drexler Fox, Natalie Delworth
Abstract
Contextual ASR, which takes a list of bias terms as input along with audio, has drawn recent interest as ASR use becomes more widespread. We are releasing contextual biasing lists to accompany the Earnings21 dataset, creating a public benchmark for this task. We present baseline results on this benchmark using a pretrained end-to-end ASR model from the WeNet toolkit. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. Our baseline results confirm observations that end-to-end models struggle in particular with words that are rarely or never seen during training, and that existing shallow fusion techniques do not adequately address this problem. We propose an alternate spelling prediction model that improves recall of rare words by 34.7% relative and of out-of-vocabulary words by 97.2% relative, compared to contextual biasing without alternate spellings. This model is conceptually similar to ones used in prior work, but is simpler to implem
Authors
(none)
Tags
Stats
Related papers
- Towards Contextual Spelling Correction For Customization Of End-to-end Speech Recognition Systems (2022)9.92
- Improving Neural Biasing For Contextual Speech Recognition By Early Context Injection And Text Perturbation (2024)8.09
- Robust Acoustic And Semantic Contextual Biasing In Neural Transducers For Speech Recognition (2023)8.60
- Improving Synthetic Data Training For Contextual Biasing Models With A Keyword-aware Cost Function (2025)0.00
- Contextualized Streaming End-to-end Speech Recognition With Trie-based Deep Biasing And Shallow Fusion (2021)13.44
- Contextualized End-to-end Automatic Speech Recognition With Intermediate Biasing Loss (2024)5.84
- Ed-cec: Improving Rare Word Recognition Using Asr Postprocessing Based On Error Detection And Context-aware Error Correction (2023)6.34
- Contextualized End-to-end Speech Recognition With Contextual Phrase Prediction Network (2023)10.48