Towards Better Decoding And Language Model Integration In Sequence To Sequence Models
2016 Β· Jan Chorowski, Navdeep Jaitly
Abstract
The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion. In this contribution, we analyse an attention-based seq2seq speech recognition system that directly transcribes recordings into characters. We observe two shortcomings: overconfidence in its predictions and a tendency to produce incomplete transcriptions when language models are used. We propose practical solutions to both problems achieving competitive speaker independent word error rates on the Wall Street Journal dataset: without separate language models we reach 10.6% WER, while together with a trigram language model, we reach 6.7% WER.
Authors
(none)
Tags
Stats
Related papers
- Acoustic-to-word Recognition With Sequence-to-sequence Models (2018)6.77
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)21.01
- Single Headed Attention Based Sequence-to-sequence Model For State-of-the-art Results On Switchboard (2020)0.00
- Multilingual Sequence-to-sequence Speech Recognition: Architecture, Transfer Learning, And Language Modeling (2018)13.84
- On Using 2D Sequence-to-sequence Models For Speech Recognition (2019)0.00
- Exploring Neural Transducers For End-to-end Speech Recognition (2017)14.90
- High Performance Sequence-to-sequence Model For Streaming Speech Recognition (2020)3.58
- An Online Sequence-to-sequence Model For Noisy Speech Recognition (2017)0.00