Letter-based Speech Recognition With Gated Convnets
2017 Β· Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert
Abstract
In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC). In contrast to traditional phone (or senone)-based approaches, these "end-to-end'' approaches alleviate the need of word pronunciation modeling, and do not require a "forced alignment" step at training time. Phone-based approaches remain however state of the art on classical benchmarks. In this paper, we propose a letter-based speech recognition system, leveraging a ConvNet acoustic model. Key ingredients of the ConvNet are Gated Linear Units and high dropout. The ConvNet is trained to map audio sequences to their corresponding letter transcriptions, either via a classical CTC approach, or via a recent variant called ASG. Coupled with a simple decoder at inference time, our system matches the best existing letter-based systems on WSJ (in word error rate), and s
Authors
(none)
Tags
Stats
Related papers
- Wav2letter: An End-to-end Convnet-based Speech Recognition System (2016)0.00
- Fully Convolutional Speech Recognition (2018)0.00
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- Advances In All-neural Speech Recognition (2016)11.29
- Advancing CTC-CRF Based End-to-end Speech Recognition With Wordpieces And Conformers (2021)0.00
- Residual Convolutional CTC Networks For Automatic Speech Recognition (2017)0.00
- Furcanet: An End-to-end Deep Gated Convolutional, Long Short-term Memory, Deep Neural Networks For Single Channel Speech Separation (2019)0.00
- End-to-end Speech Recognition Using A High Rank LSTM-CTC Based Model (2019)11.54