Self-attention Transducers For End-to-end Speech Recognition
2019 Β· Zhengkun Tian, Jiangyan Yi, Jianhua Tao, et al.
Abstract
Recurrent neural network transducers (RNN-T) have been successfully applied in end-to-end speech recognition. However, the recurrent structure makes it difficult for parallelization . In this paper, we propose a self-attention transducer (SA-T) for speech recognition. RNNs are replaced with self-attention blocks, which are powerful to model long-term dependencies inside sequences and able to be efficiently parallelized. Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance. Additionally, a chunk-flow mechanism is utilized to achieve online decoding. All experiments are conducted on a Mandarin Chinese dataset AISHELL-1. The results demonstrate that our proposed approach achieves a 21.3% relative reduction in character error rate compared with the baseline RNN-T. In addition, the SA-T with chunk-flow mechanism can perform online decoding with only a little degradation of the performance.
Authors
(none)
Tags
Stats
Related papers
- Unidirectional Memory-self-attention Transducer For Online Speech Recognition (2021)3.58
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- Simplified Self-attention For Transformer-based End-to-end Speech Recognition (2020)10.61
- Cascade Rnn-transducer: Syllable Based Streaming On-device Mandarin Speech Recognition With A Syllable-to-character Converter (2020)9.92
- Exploring Architectures, Data And Units For Streaming End-to-end Speech Recognition With Rnn-transducer (2018)16.21
- Exploring Rnn-transducer For Chinese Speech Recognition (2018)9.23
- Transformer Transducer: A Streamable Speech Recognition Model With Transformer Encoders And RNN-T Loss (2020)18.58