Transformer-based Online Ctc/attention End-to-end Speech Recognition Architecture
2020 Β· Haoran Miao, Gaofeng Cheng, Changfeng Gao, et al.
Abstract
Recently, Transformer has gained success in automatic speech recognition (ASR) field. However, it is challenging to deploy a Transformer-based end-to-end (E2E) model for online speech recognition. In this paper, we propose the Transformer-based online CTC/attention E2E ASR architecture, which contains the chunk self-attention encoder (chunk-SAE) and the monotonic truncated attention (MTA) based self-attention decoder (SAD). Firstly, the chunk-SAE splits the speech into isolated chunks. To reduce the computational cost and improve the performance, we propose the state reuse chunk-SAE. Sencondly, the MTA based SAD truncates the speech features monotonically and performs attention on the truncated features. To support the online recognition, we integrate the state reuse chunk-SAE and the MTA based SAD into online CTC/attention architecture. We evaluate the proposed online models on the HKUST Mandarin ASR benchmark and achieve a 23.66% character error rate (CER) with a 320 ms latency. Our
Authors
(none)
Tags
Stats
Related papers
- Online Hybrid Ctc/attention End-to-end Automatic Speech Recognition Architecture (2023)12.99
- Towards Online End-to-end Transformer Automatic Speech Recognition (2019)0.00
- Transformer-based Online Speech Recognition With Decoder-end Adaptive Computation Steps (2020)7.81
- Synchronous Transformers For End-to-end Speech Recognition (2019)12.02
- Improving Hybrid Ctc/attention End-to-end Speech Recognition With Pretrained Acoustic And Language Model (2021)8.82
- A CTC Alignment-based Non-autoregressive Transformer For End-to-end Automatic Speech Recognition (2023)10.97
- Unidirectional Memory-self-attention Transducer For Online Speech Recognition (2021)3.58
- Fast Offline Transformer-based End-to-end Automatic Speech Recognition For Real-world Applications (2021)7.16