Focus On The Present: A Regularization Method For The ASR Source-target Attention Layer
2020 · Nanxin Chen, Piotr Żelasko, Jesús Villalba, et al.
Abstract
This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training. Our method is based on the fact that both, CTC and source-target attention, are acting on the same encoder representations. To understand the functionality of the attention, CTC is applied to compute the token posteriors given the attention outputs. We found that the source-target attention heads are able to predict several tokens ahead of the current one. Inspired by the observation, a new regularization method is proposed which leverages CTC to make source-target attention more focused on the frames corresponding to the output token being predicted by the decoder. Experiments reveal stable improvements up to 7% and 13% relatively with the proposed regularization on TED-LIUM 2 and LibriSpeech.
Authors
(none)
Tags
Stats
Related papers
- An Improved Hybrid Ctc-attention Model For Speech Recognition (2018)0.00
- Streaming Audio-visual Speech Recognition With Alignment Regularization (2022)3.58
- Joint Ctc-attention Based End-to-end Speech Recognition Using Multi-task Learning (2016)20.43
- CR-CTC: Consistency Regularization On CTC For Improved Speech Recognition (2024)6.30
- Advances In Joint Ctc-attention Based End-to-end Speech Recognition With A Deep CNN Encoder And RNN-LM (2017)16.49
- Self-attention Networks For Connectionist Temporal Classification In Speech Recognition (2019)14.55
- Relaxed Attention: A Simple Method To Boost Performance Of End-to-end Automatic Speech Recognition (2021)3.58
- Integrating Source-channel And Attention-based Sequence-to-sequence Models For Speech Recognition (2019)8.09