Word-level Speech Recognition With A Letter To Word Encoder
2019 Β· Ronan Collobert, Awni Hannun, Gabriel Synnaeve
Abstract
We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.
Authors
(none)
Tags
Stats
Related papers
- Acoustic-to-word Recognition With Sequence-to-sequence Models (2018)6.77
- Towards Better Decoding And Language Model Integration In Sequence To Sequence Models (2016)15.67
- Letter-based Speech Recognition With Gated Convnets (2017)0.00
- Wav2letter: An End-to-end Convnet-based Speech Recognition System (2016)0.00
- Speech2vec: A Sequence-to-sequence Framework For Learning Word Embeddings From Speech (2018)14.15
- Learning Word Embeddings From Speech (2017)0.00
- End-to-end Speech Recognition With Word-based RNN Language Models (2018)0.00
- Phoneme Based Neural Transducer For Large Vocabulary Speech Recognition (2020)9.59