Combining Frame-synchronous And Label-synchronous Systems For Speech Recognition
2021 Β· Qiujia Li, Chao Zhang, Philip C. Woodland
Abstract
Commonly used automatic speech recognition (ASR) systems can be classified into frame-synchronous and label-synchronous categories, based on whether the speech is decoded on a per-frame or per-label basis. Frame-synchronous systems, such as traditional hidden Markov model systems, can easily incorporate existing knowledge and can support streaming ASR applications. Label-synchronous systems, based on attention-based encoder-decoder models, can jointly learn the acoustic and language information with a single model, which can be regarded as audio-grounded language models. In this paper, we propose rescoring the N-best hypotheses or lattices produced by a first-pass frame-synchronous system with a label-synchronous system in a second-pass. By exploiting the complementary modelling of the different approaches, the combined two-pass systems achieve competitive performance without using any extra speech or text data on two standard ASR tasks. For the 80-hour AMI IHM dataset, the combined sy
Authors
(none)
Tags
Stats
Related papers
- Integration Of Frame- And Label-synchronous Beam Search For Streaming Encoder-decoder Speech Recognition (2023)0.00
- A Comparison Of Label-synchronous And Frame-synchronous End-to-end Models For Speech Recognition (2020)0.00
- Label-synchronous Speech-to-text Alignment For ASR Using Forward And Backward Transformers (2021)0.00
- Integrating Source-channel And Attention-based Sequence-to-sequence Models For Speech Recognition (2019)8.09
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- Label-synchronous Neural Transducer For Adaptable Online E2E Speech Recognition (2023)3.58
- Streaming Language Identification Using Combination Of Acoustic Representations And ASR Hypotheses (2020)0.00
- Investigating The Effect Of Label Topology And Training Criterion On ASR Performance And Alignment Quality (2024)0.00