Efficient Sequence Training Of Attention Models Using Approximative Recombination
2021 Β· Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, et al.
Abstract
Sequence discriminative training is a great tool to improve the performance of an automatic speech recognition system. It does, however, necessitate a sum over all possible word sequences, which is intractable to compute in practice. Current state-of-the-art systems with unlimited label context circumvent this problem by limiting the summation to an n-best list of relevant competing hypotheses obtained from beam search. This work proposes to perform (approximative) recombinations of hypotheses during beam search, if they share a common local history. The error that is incurred by the approximation is analyzed and it is shown that using this technique the effective beam size can be increased by several orders of magnitude without significantly increasing the computational requirements. Lastly, it is shown that this technique can be used to effectively perform sequence discriminative training for attention-based encoder-decoder acoustic models on the LibriSpeech task.
Authors
(none)
Tags
Stats
Related papers
- Minimum Word Error Rate Training For Attention-based Sequence-to-sequence Models (2017)14.35
- Robust Beam Search For Encoder-decoder Attention Based Speech Recognition Without Length Bias (2020)4.52
- Supervised Attention In Sequence-to-sequence Models For Speech Recognition (2022)5.84
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)21.01
- Sequence-to-sequence Learning Via Attention Transfer For Incremental Speech Recognition (2020)4.52
- Segment-level Vectorized Beam Search Based On Partially Autoregressive Inference (2023)0.00
- Audio-attention Discriminative Language Model For ASR Rescoring (2019)9.23
- Attentional Speech Recognition Models Misbehave On Out-of-domain Utterances (2020)0.00