A Study Of All-convolutional Encoders For Connectionist Temporal Classification
2017 Β· Kalpesh Krishna, Liang Lu, Kevin Gimpel, et al.
Abstract
Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC. CNNs lack an explicit representation of the entire sequence, but have the advantage that they are much faster to train. We present an exploration of CNNs as encoders for CTC models, in the context of character-based (lexicon-free) automatic speech recognition. In particular, we explore a range of one-dimensional convolutional layers, which are particularly efficient. We compare the performance of our CNN-based models against typical RNNbased models in terms of training time, decoding time, model size and word error rate (WER) on the Switchboard Eval2000 corpus. We find that our CNN-based models are close in performance to LSTMs, while not matching them, and are much fa
Authors
(none)
Tags
Stats
Related papers
- Residual Convolutional CTC Networks For Automatic Speech Recognition (2017)0.00
- Self-attention Networks For Connectionist Temporal Classification In Speech Recognition (2019)14.55
- Comparison Of Decoding Strategies For CTC Acoustic Models (2017)10.48
- Advancing Connectionist Temporal Classification With Attention Modeling (2018)11.49
- Training LDCRF Model On Unsegmented Sequences Using Connectionist Temporal Classification (2016)2.26
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- Variational Connectionist Temporal Classification For Order-preserving Sequence Modeling (2023)5.24
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00