Wav2letter: An End-to-end Convnet-based Speech Recognition System
2016 Β· Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve
Abstract
This paper presents a simple end-to-end model for speech recognition, combining a convolutional network based acoustic model and a graph decoding. It is trained to output letters, with transcribed speech, without the need for force alignment of phonemes. We introduce an automatic segmentation criterion for training from sequence annotation without alignment that is on par with CTC while being simpler. We show competitive results in word error rate on the Librispeech corpus with MFCC features, and promising results from raw waveform.
Authors
(none)
Tags
Stats
Related papers
- Letter-based Speech Recognition With Gated Convnets (2017)0.00
- Fully Convolutional Speech Recognition (2018)0.00
- Phoneme Based Neural Transducer For Large Vocabulary Speech Recognition (2020)9.59
- Attention-based Wav2text With Feature Transfer Learning (2017)8.60
- Advances In All-neural Speech Recognition (2016)11.29
- Wav2vec: Unsupervised Pre-training For Speech Recognition (2019)0.00
- Improving Non-autoregressive End-to-end Speech Recognition With Pre-trained Acoustic And Language Models (2022)10.07
- Acoustic-to-word Recognition With Sequence-to-sequence Models (2018)6.77