Two-pass Decoding And Cross-adaptation Based System Combination Of End-to-end Conformer And Hybrid TDNN ASR Systems
2022 Β· Mingyu Cui, Jiajun Deng, Shoukang Hu, et al.
Abstract
Fundamental modelling differences between hybrid and end-to-end (E2E) automatic speech recognition (ASR) systems create large diversity and complementarity among them. This paper investigates multi-pass rescoring and cross adaptation based system combination approaches for hybrid TDNN and Conformer E2E ASR systems. In multi-pass rescoring, state-of-the-art hybrid LF-MMI trained CNN-TDNN system featuring speed perturbation, SpecAugment and Bayesian learning hidden unit contributions (LHUC) speaker adaptation was used to produce initial N-best outputs before being rescored by the speaker adapted Conformer system using a 2-way cross system score interpolation. In cross adaptation, the hybrid CNN-TDNN system was adapted to the 1-best output of the Conformer system or vice versa. Experiments on the 300-hour Switchboard corpus suggest that the combined systems derived using either of the two system combination approaches outperformed the individual systems. The best combined system obtained
Authors
(none)
Tags
Stats
Related papers
- Have Best Of Both Worlds: Two-pass Hybrid And E2E Cascading Framework For Speech Recognition (2021)6.34
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- Conformer-based Hybrid ASR System For Switchboard Dataset (2021)9.41
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84
- Combining Frame-synchronous And Label-synchronous Systems For Speech Recognition (2021)0.00
- 4D ASR: Joint Modeling Of CTC, Attention, Transducer, And Mask-predict Decoders (2022)7.50
- Advancing CTC-CRF Based End-to-end Speech Recognition With Wordpieces And Conformers (2021)0.00
- Confidence Score Based Conformer Speaker Adaptation For Speech Recognition (2022)8.09