End-to-end Speech Recognition Using A High Rank LSTM-CTC Based Model
2019 Β· Yangyang Shi, Mei-Yuh Hwang, Xin Lei
Abstract
Long Short Term Memory Connectionist Temporal Classification (LSTM-CTC) based end-to-end models are widely used in speech recognition due to its simplicity in training and efficiency in decoding. In conventional LSTM-CTC based models, a bottleneck projection matrix maps the hidden feature vectors obtained from LSTM to softmax output layer. In this paper, we propose to use a high rank projection layer to replace the projection matrix. The output from the high rank projection layer is a weighted combination of vectors that are projected from the hidden feature vectors via different projection matrices and non-linear activation function. The high rank projection layer is able to improve the expressiveness of LSTM-CTC models. The experimental results show that on Wall Street Journal (WSJ) corpus and LibriSpeech data set, the proposed method achieves 4%-6% relative word error rate (WER) reduction over the baseline CTC system. They outperform other published CTC based end-to-end (E2E) models
Authors
(none)
Tags
Stats
Related papers
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00
- Improving RNN Transducer Modeling For End-to-end Speech Recognition (2019)0.00
- BERT Meets CTC: New Formulation Of End-to-end Speech Recognition With Pre-trained Masked Language Model (2022)0.00
- Residual Convolutional CTC Networks For Automatic Speech Recognition (2017)0.00
- End-to-end Speech Recognition With Word-based RNN Language Models (2018)0.00
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)20.43
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- Hierarchical Conditional End-to-end ASR With CTC And Multi-granular Subword Units (2021)9.23