Efficient Segmental Cascades For Speech Recognition
2016 Β· Hao Tang, Weiran Wang, Kevin Gimpel, et al.
Abstract
Discriminative segmental models offer a way to incorporate flexible feature functions into speech recognition. However, their appeal has been limited by their computational requirements, due to the large number of possible segments to consider. Multi-pass cascades of segmental models introduce features of increasing complexity in different passes, where in each pass a segmental model rescores lattices produced by a previous (simpler) segmental model. In this paper, we explore several ways of making segmental cascades efficient and practical: reducing the feature set in the first pass, frame subsampling, and various pruning approaches. In experiments on phonetic recognition, we find that with a combination of such techniques, it is possible to maintain competitive performance while greatly reducing decoding, pruning, and training time.
Authors
(none)
Tags
Stats
Related papers
- Sequence Prediction With Neural Segmental Models (2017)0.00
- Segmental Recurrent Neural Networks For End-to-end Speech Recognition (2016)0.00
- End-to-end Neural Segmental Models For Speech Recognition (2017)9.23
- Multitask Learning With CTC And Segmental CRF For Speech Recognition (2017)0.00
- Tight Integrated End-to-end Training For Cascaded Speech Translation (2020)8.35
- End-to-end Training Approaches For Discriminative Segmental Models (2016)5.84
- CTC Blank Triggered Dynamic Layer-skipping For Efficient Ctc-based Speech Recognition (2024)0.00
- Constrained Convolutional-recurrent Networks To Improve Speech Quality With Low Impact On Recognition Accuracy (2018)5.24