Improving LSTM-CTC Based ASR Performance In Domains With Limited Training Data
2017 Β· Jayadev Billa
Abstract
This paper addresses the observed performance gap between automatic speech recognition (ASR) systems based on Long Short Term Memory (LSTM) neural networks trained with the connectionist temporal classification (CTC) loss function and systems based on hybrid Deep Neural Networks (DNNs) trained with the cross entropy (CE) loss function on domains with limited data. We step through a number of experiments that show incremental improvements on a baseline EESEN toolkit based LSTM-CTC ASR system trained on the Librispeech 100hr (train-clean-100) corpus. Our results show that with effective combination of data augmentation and regularization, a LSTM-CTC based system can exceed the performance of a strong Kaldi based baseline trained on the same data.
Authors
(none)
Tags
Stats
Related papers
- Ctc-segmentation Of Large Corpora For German End-to-end Speech Recognition (2020)12.93
- Knn-ctc: Enhancing ASR Via Retrieval Of CTC Pseudo Labels (2023)11.36
- Linguistic-enhanced Transformer With CTC Embedding For Speech Recognition (2022)2.26
- Residual Convolutional CTC Networks For Automatic Speech Recognition (2017)0.00
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00
- Comparing The Benefit Of Synthetic Training Data For Various Automatic Speech Recognition Architectures (2021)5.24
- Continual Learning For Monolingual End-to-end Automatic Speech Recognition (2021)7.16
- Improved Mask-ctc For Non-autoregressive End-to-end ASR (2020)11.76