Towards Personalization Of CTC Speech Recognition Models With Contextual Adapters And Adaptive Boosting
2022 Β· Saket Dingliwal, Monica Sunkara, Sravan Bodapati, et al.
Abstract
End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previous time steps to influence future predictions. To tackle this, we propose a novel two-way approach that first biases the encoder with attention over a predefined list of rare long-tail and out-of-vocabulary (OOV) words and then uses dynamic boosting and phone alignment network during decoding to further bias the subword predictions. We evaluate our approach on open-source VoxPopuli and in-house medical datasets to showcase a 60% improvement in F1 score on domain-specific rare words over a strong CTC baseline.
Authors
(none)
Tags
Stats
Related papers
- Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages (2023)4.52
- Contextual Adapters For Personalized Speech Recognition In Neural Transducers (2022)12.47
- Personalization Of Ctc-based End-to-end Speech Recognition Using Pronunciation-driven Subword Tokenization (2023)6.77
- Fast Contextual Adaptation With Neural Associative Memory For On-device Personalized Speech Recognition (2021)9.76
- End-to-end Contextual Asr Based On Posterior Distribution Adaptation For Hybrid Ctc/attention System (2022)0.00
- Adaptive Contextual Biasing For Transducer Based Streaming Speech Recognition (2023)7.16
- Advancing Connectionist Temporal Classification With Attention Modeling (2018)11.49
- Fast Context-biasing For CTC And Transducer ASR Models With Ctc-based Word Spotter (2024)2.26