Mask The Bias: Improving Domain-adaptive Generalization Of Ctc-based ASR With Internal Language Model Estimation
2023 Β· Nilaksh Das, Monica Sunkara, Sravan Bodapati, et al.
Abstract
End-to-end ASR models trained on large amount of data tend to be implicitly biased towards language semantics of the training data. Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T. Typically, ILME is performed by modularizing the acoustic and language components of the model architecture, and eliminating the acoustic input to perform log-linear interpolation with the text-only posterior. However, for CTC-based ASR, it is not as straightforward to decouple the model into such acoustic and language components, as CTC log-posteriors are computed in a non-autoregressive manner. In this work, we propose a novel ILME technique for CTC-based ASR models. Our method iteratively masks the audio timesteps to estimate a pseudo log-likelihood of the internal LM by accumulating log-posteriors for only the masked timesteps. Extensive evaluation across multiple out-of-domain datasets reveals t
Authors
(none)
Tags
Stats
Related papers
- Internal Language Model Estimation For Domain-adaptive End-to-end Speech Recognition (2020)13.44
- Internal Language Model Training For Domain-adaptive End-to-end Speech Recognition (2021)11.39
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- Enhancing Code-switching Speech Recognition With Interactive Language Biases (2023)9.92
- Internal Language Model Estimation Based Adaptive Language Model Fusion For Domain Adaptation (2022)0.00
- Adaptable End-to-end ASR Models Using Replaceable Internal Lms And Residual Softmax (2023)0.00
- Internal Language Model Estimation Through Explicit Context Vector Learning For Attention-based Encoder-decoder ASR (2022)7.50
- Non-autoregressive Error Correction For Ctc-based ASR With Phone-conditioned Masked LM (2022)5.84