Acoustic-to-word Model Without OOV
2017 Β· Jinyu Li, Guoli Ye, Rui Zhao, et al.
Abstract
Recently, the acoustic-to-word model based on the Connectionist Temporal Classification (CTC) criterion was shown as a natural end-to-end model directly targeting words as output units. However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Therefore, such word-based CTC model can only recognize the frequent words modeled by the network output nodes. It also cannot easily handle the hot-words which emerge after the model is trained. In this study, we improve the acoustic-to-word model with a hybrid CTC model which can predict both words and characters at the same time. With a shared-hidden-layer structure and modular design, the alignments of words generated from the word-based CTC and the character-based CTC are synchronized. Whenever the acoustic-to-word model emits an OOV token, we back off that OOV segment to the word output g
Authors
(none)
Tags
Stats
Related papers
- Improving OOV Detection And Resolution With External Language Models In Acoustic-to-word ASR (2019)5.24
- Acoustic-to-word Recognition With Sequence-to-sequence Models (2018)6.77
- Using Multi-task Learning To Improve The Performance Of Acoustic-to-word And Conventional Hybrid Models (2019)0.00
- Comparison Of Decoding Strategies For CTC Acoustic Models (2017)10.48
- Using Synthetic Audio To Improve The Recognition Of Out-of-vocabulary Words In End-to-end ASR Systems (2020)12.33
- Hierarchical Conditional End-to-end ASR With CTC And Multi-granular Subword Units (2021)9.23
- Towards Personalization Of CTC Speech Recognition Models With Contextual Adapters And Adaptive Boosting (2022)0.00
- Neural Speech Recognizer: Acoustic-to-word LSTM Model For Large Vocabulary Speech Recognition (2016)15.16