Multi-scale Alignment And Contextual History For Attention Mechanism In Sequence-to-sequence Model
2018 Β· Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Abstract
A sequence-to-sequence model is a neural network module for mapping two sequences of different lengths. The sequence-to-sequence model has three core modules: encoder, decoder, and attention. Attention is the bridge that connects the encoder and decoder modules and improves model performance in many tasks. In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module. First, we maintain the history of the location and the expected context from several previous time-steps. Second, we apply multiscale convolution from several previous attention vectors to the current decoder state. We utilized our proposed framework for sequence-to-sequence speech recognition and text-to-speech systems. The results reveal that our proposed extension could improve performance significantly compared to a standard attention baseline.
Authors
(none)
Tags
Stats
Related papers
- Supervised Attention In Sequence-to-sequence Models For Speech Recognition (2022)5.84
- Advancing Connectionist Temporal Classification With Attention Modeling (2018)11.49
- Multimodal Grounding For Sequence-to-sequence Speech Recognition (2018)8.82
- State-of-the-art Speech Recognition With Sequence-to-sequence Models (2017)21.01
- Improving Transformer-based Conversational ASR By Inter-sentential Attention Mechanism (2022)7.50
- Hierarchical Context-aware Transformers For Non-autoregressive Text To Speech (2021)5.24
- Forward Attention In Sequence-to-sequence Acoustic Modelling For Speech Synthesis (2018)12.10
- On Using 2D Sequence-to-sequence Models For Speech Recognition (2019)0.00