Residual Convolutional CTC Networks For Automatic Speech Recognition
2017 Β· Yisen Wang, Xuejiao Deng, Songbai Pu, et al.
Abstract
Deep learning approaches have been widely used in Automatic Speech Recognition (ASR) and they have achieved a significant accuracy improvement. Especially, Convolutional Neural Networks (CNNs) have been revisited in ASR recently. However, most CNNs used in existing work have less than 10 layers which may not be deep enough to capture all human speech signal information. In this paper, we propose a novel deep and wide CNN architecture denoted as RCNN-CTC, which has residual connections and Connectionist Temporal Classification (CTC) loss function. RCNN-CTC is an end-to-end system which can exploit temporal and spectral structures of speech signals simultaneously. Furthermore, we introduce a CTC-based system combination, which is different from the conventional frame-wise senone-based one. The basic subsystems adopted in the combination are different types and thus mutually complementary to each other. Experimental results show that our proposed single system RCNN-CTC can achieve the low
Authors
(none)
Tags
Stats
Related papers
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- A Study Of All-convolutional Encoders For Connectionist Temporal Classification (2017)5.84
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76
- A CTC Alignment-based Non-autoregressive Transformer For End-to-end Automatic Speech Recognition (2023)10.97
- Advances In All-neural Speech Recognition (2016)11.29
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- Single Channel Speech Enhancement Using Temporal Convolutional Recurrent Neural Networks (2020)5.84