A Fully Differentiable Beam Search Decoder
2019 Β· Ronan Collobert, Awni Hannun, Gabriel Synnaeve
Abstract
We introduce a new beam search decoder that is fully differentiable, making it possible to optimize at training time through the inference procedure. Our decoder allows us to combine models which operate at different granularities (e.g. acoustic and language models). It can be used when target sequences are not aligned to input sequences by considering all possible alignments between the two. We demonstrate our approach scales by applying it to speech recognition, jointly training acoustic and word-level language models. The system is end-to-end, with gradients flowing through the whole architecture from the word-level transcriptions. Recent research efforts have shown that deep neural networks with attention-based mechanisms are powerful enough to successfully train an acoustic model from the final transcription, while implicitly learning a language model. Instead, we show that it is possible to discriminatively train an acoustic model jointly with an explicit and possibly pre-trained
Authors
(none)
Tags
Stats
Related papers
- Joint Beam Search Integrating CTC, Attention, And Transducer Decoders (2024)5.24
- Robust Beam Search For Encoder-decoder Attention Based Speech Recognition Without Length Bias (2020)4.52
- Segment-level Vectorized Beam Search Based On Partially Autoregressive Inference (2023)0.00
- Vectorization Of Hypotheses And Speech For Faster Beam Search In Encoder Decoder-based Speech Recognition (2018)0.00
- Integration Of Frame- And Label-synchronous Beam Search For Streaming Encoder-decoder Speech Recognition (2023)0.00
- Navigating The Minefield Of MT Beam Search In Cascaded Streaming Speech Translation (2024)3.58
- Beam Search Decoding Using Manner Of Articulation Detection Knowledge Derived From Connectionist Temporal Classification (2018)0.00
- Streaming Parallel Transducer Beam Search With Fast-slow Cascaded Encoders (2022)0.00