Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition
2021 Β· Cong-Thanh Do, Rama Doddipatla, Thomas Hain
Abstract
This paper proposes an adaptation method for end-to-end speech recognition. In this method, multiple automatic speech recognition (ASR) 1-best hypotheses are integrated in the computation of the connectionist temporal classification (CTC) loss function. The integration of multiple ASR hypotheses helps alleviating the impact of errors in the ASR hypotheses to the computation of the CTC loss when ASR hypotheses are used. When being applied in semi-supervised adaptation scenarios where part of the adaptation data do not have labels, the CTC loss of the proposed method is computed from different ASR 1-best hypotheses obtained by decoding the unlabeled adaptation data. Experiments are performed in clean and multi-condition training scenarios where the CTC-based end-to-end ASR systems are trained on Wall Street Journal (WSJ) clean training data and CHiME-4 multi-condition training data, respectively, and tested on Aurora-4 test data. The proposed adaptation method yields 6.6% and 5.8% relati
Authors
(none)
Tags
Stats
Related papers
- Multilingual Training And Cross-lingual Adaptation On Ctc-based Acoustic Model (2017)0.00
- Multiple-hypothesis RNN-T Loss For Unsupervised Fine-tuning And Self-training Of Neural Transducer (2022)0.00
- Speaker Adaptation For End-to-end CTC Models (2019)8.60
- Multi-encoder Multi-resolution Framework For End-to-end Speech Recognition (2018)0.00
- Audio Adversarial Examples For Robust Hybrid Ctc/attention Speech Recognition (2020)3.58
- End-to-end Multimodal Speech Recognition (2018)10.21
- Continual Learning For Monolingual End-to-end Automatic Speech Recognition (2021)7.16
- 3M: Multi-loss, Multi-path And Multi-level Neural Networks For Speech Recognition (2022)8.67