Adaptable End-to-end ASR Models Using Replaceable Internal Lms And Residual Softmax
2023 Β· Keqi Deng, Philip C. Woodland
Abstract
End-to-end (E2E) automatic speech recognition (ASR) implicitly learns the token sequence distribution of paired audio-transcript training data. However, it still suffers from domain shifts from training to testing, and domain adaptation is still challenging. To alleviate this problem, this paper designs a replaceable internal language model (RILM) method, which makes it feasible to directly replace the internal language model (LM) of E2E ASR models with a target-domain LM in the decoding stage when a domain shift is encountered. Furthermore, this paper proposes a residual softmax (R-softmax) that is designed for CTC-based E2E ASR models to adapt to the target domain without re-training during inference. For E2E ASR models trained on the LibriSpeech corpus, experiments showed that the proposed methods gave a 2.6% absolute WER reduction on the Switchboard data and a 1.0% WER reduction on the AESRC2020 corpus while maintaining intra-domain ASR results.
Authors
(none)
Tags
Stats
Related papers
- Internal Language Model Training For Domain-adaptive End-to-end Speech Recognition (2021)11.39
- Internal Language Model Estimation For Domain-adaptive End-to-end Speech Recognition (2020)13.44
- Independent Language Modeling Architecture For End-to-end ASR (2019)0.00
- Internal Language Model Estimation Based Adaptive Language Model Fusion For Domain Adaptation (2022)0.00
- Integrating Text Inputs For Training And Adapting RNN Transducer ASR Models (2022)9.59
- Integrating Pre-trained Speech And Language Models For End-to-end Speech Recognition (2023)0.00
- Mask The Bias: Improving Domain-adaptive Generalization Of Ctc-based ASR With Internal Language Model Estimation (2023)3.58
- Effective Text Adaptation For Llm-based ASR Through Soft Prompt Fine-tuning (2024)5.84