Label-synchronous Neural Transducer For Adaptable Online E2E Speech Recognition
2023 Β· Keqi Deng, Philip C. Woodland
Abstract
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art recognition accuracy, it tends to be implicitly biased towards the training data distribution which can degrade generalisation. This paper proposes a label-synchronous neural transducer (LS-Transducer), which provides a natural approach to domain adaptation based on text-only data. The LS-Transducer extracts a label-level encoder representation before combining it with the prediction network output. Since blank tokens are no longer needed, the prediction network performs as a standard language model, which can be easily adapted using text-only data. An Auto-regressive Integrate-and-Fire (AIF) mechanism is proposed to generate the label-level encoder representation while retaining low latency operation that can be used for streaming. In addition, a streaming joint decoding method is designed to improve ASR accuracy while retaining synchronisation with AIF. Experiments show that compared to standard n
Authors
(none)
Tags
Stats
Related papers
- Label-synchronous Neural Transducer For E2E Simultaneous Speech Translation (2024)0.00
- Label-synchronous Speech-to-text Alignment For ASR Using Forward And Backward Transformers (2021)0.00
- Conv-transformer Transducer: Low Latency, Low Frame Rate, Streamable End-to-end Speech Recognition (2020)11.08
- Integrating Text Inputs For Training And Adapting RNN Transducer ASR Models (2022)9.59
- Transformer-transducers For Code-switched Speech Recognition (2020)10.97
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00
- Transformer-based Online Speech Recognition With Decoder-end Adaptive Computation Steps (2020)7.81
- Fast Offline Transformer-based End-to-end Automatic Speech Recognition For Real-world Applications (2021)7.16