Text-only Domain Adaptation Using Unified Speech-text Representation In Transducer
2023 Β· Lu Huang, Boyu Li, Jun Zhang, et al.
Abstract
Domain adaptation using text-only corpus is challenging in end-to-end(E2E) speech recognition. Adaptation by synthesizing audio from text through TTS is resource-consuming. We present a method to learn Unified Speech-Text Representation in Conformer Transducer(USTR-CT) to enable fast domain adaptation using the text-only corpus. Different from the previous textogram method, an extra text encoder is introduced in our work to learn text representation and is removed during inference, so there is no modification for online deployment. To improve the efficiency of adaptation, single-step and multi-step adaptations are also explored. The experiments on adapting LibriSpeech to SPGISpeech show the proposed method reduces the word error rate(WER) by relatively 44% on the target domain, which is better than those of TTS method and textogram method. Also, it is shown the proposed method can be combined with internal language model estimation(ILME) to further improve the performance.
Authors
(none)
Tags
Stats
Related papers
- Text-only Domain Adaptation For End-to-end Speech Recognition Through Down-sampling Acoustic Representation (2023)0.00
- A Simple Baseline For Domain Adaptation In End To End ASR Systems Using Synthetic Data (2022)7.16
- Exploring Machine Speech Chain For Domain Adaptation And Few-shot Speaker Adaptation (2021)0.00
- Integrating Text Inputs For Training And Adapting RNN Transducer ASR Models (2022)9.59
- Unsupervised Domain Adaptation For Speech Recognition With Unsupervised Error Correction (2022)5.24
- Label-synchronous Neural Transducer For Adaptable Online E2E Speech Recognition (2023)3.58
- Rapid Speaker Adaptation In Low Resource Text To Speech Systems Using Synthetic Data And Transfer Learning (2023)0.00
- Instance-based Model Adaptation For Direct Speech Translation (2019)0.00