Attention-based Gated Scaling Adaptative Acoustic Model For Ctc-based Speech Recognition
2019 Β· Fenglin Ding, Wu Guo, Lirong Dai, et al.
Abstract
In this paper, we propose a novel adaptive technique that uses an attention-based gated scaling (AGS) scheme to improve deep feature learning for connectionist temporal classification (CTC) acoustic modeling. In AGS, the outputs of each hidden layer of the main network are scaled by an auxiliary gate matrix extracted from the lower layer by using attention mechanisms. Furthermore, the auxiliary AGS layer and the main network are jointly trained without requiring second-pass model training or additional speaker information, such as speaker code. On the Mandarin AISHELL-1 datasets, the proposed AGS yields a 7.94% character error rate (CER). To the best of our knowledge, this result is the best recognition accuracy achieved on this dataset by using an end-to-end framework.
Authors
(none)
Tags
Stats
Related papers
- Advancing Connectionist Temporal Classification With Attention Modeling (2018)11.49
- Multiple-hypothesis Ctc-based Semi-supervised Adaptation Of End-to-end Speech Recognition (2021)5.84
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00
- Self-attention Networks For Connectionist Temporal Classification In Speech Recognition (2019)14.55
- Attention-based Scaling Adaptation For Target Speech Extraction (2020)8.09
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- Linguistic-enhanced Transformer With CTC Embedding For Speech Recognition (2022)2.26
- End-to-end Speech Recognition With Adaptive Computation Steps (2018)0.00