TSNAT: Two-step Non-autoregressvie Transformer Models For Speech Recognition
2021 Β· Zhengkun Tian, Jiangyan Yi, Jianhua Tao, et al.
Abstract
The autoregressive (AR) models, such as attention-based encoder-decoder models and RNN-Transducer, have achieved great success in speech recognition. They predict the output sequence conditioned on the previous tokens and acoustic encoded states, which is inefficient on GPUs. The non-autoregressive (NAR) models can get rid of the temporal dependency between the output tokens and predict the entire output tokens in at least one step. However, the NAR model still faces two major problems. On the one hand, there is still a great gap in performance between the NAR models and the advanced AR models. On the other hand, it's difficult for most of the NAR models to train and converge. To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT), which improves the performance and accelerating the convergence of the NAR model by learning prior knowledge from a parameters-sharing AR model. Furthermore, we introduce the two-stage method into the i
Authors
(none)
Tags
Stats
Related papers
- Paraformer: Fast And Accurate Parallel Transformer For Non-autoregressive End-to-end Speech Recognition (2022)15.10
- Non-autoregressive Transformer With Unified Bidirectional Decoder For Automatic Speech Recognition (2021)7.81
- Improving Non-autoregressive End-to-end Speech Recognition With Pre-trained Acoustic And Language Models (2022)10.07
- Non-autoregressive Transformer ASR With Ctc-enhanced Decoder Input (2020)10.97
- A CTC Alignment-based Non-autoregressive Transformer For End-to-end Automatic Speech Recognition (2023)10.97
- An Improved Single Step Non-autoregressive Transformer For Automatic Speech Recognition (2021)0.00
- A Comparative Study On Non-autoregressive Modelings For Speech-to-text Generation (2021)11.76
- Non-autoregressive End-to-end Approaches For Joint Automatic Speech Recognition And Spoken Language Understanding (2023)5.84