Using Multi-task Learning To Improve The Performance Of Acoustic-to-word And Conventional Hybrid Models
2019 Β· Thai-Son Nguyen, Sebastian Stueker, Alex Waibel
Abstract
Acoustic-to-word (A2W) models that allow direct mapping from acoustic signals to word sequences are an appealing approach to end-to-end automatic speech recognition due to their simplicity. However, prior works have shown that modelling A2W typically encounters issues of data sparsity that prevent training such a model directly. So far, pre-training initialization is the only approach proposed to deal with this issue. In this work, we propose to build a shared neural network and optimize A2W and conventional hybrid models in a multi-task manner. Our results show that training an A2W model is much more stable with our multi-task model without pre-training initialization, and results in a significant improvement compared to a baseline model. Experiments also reveal that the performance of a hybrid acoustic model can be further improved when jointly training with a sequence-level optimization criterion such as acoustic-to-word.
Authors
(none)
Tags
Stats
Related papers
- Improving Hybrid Ctc/attention End-to-end Speech Recognition With Pretrained Acoustic And Language Model (2021)8.82
- E2e-based Multi-task Learning Approach To Joint Speech And Accent Recognition (2021)0.00
- Multi-task Learning Of Deep Neural Networks For Audio Visual Automatic Speech Recognition (2017)0.00
- Multi-task Voice Activated Framework Using Self-supervised Learning (2021)6.34
- Knowledge Distillation From Language Model To Acoustic Model: A Hierarchical Multi-task Learning Approach (2021)3.58
- Multi-task Language Modeling For Improving Speech Recognition Of Rare Words (2020)8.35
- A Highly Adaptive Acoustic Model For Accurate Multi-dialect Speech Recognition (2022)10.85
- Improvements To Embedding-matching Acoustic-to-word ASR Using Multiple-hypothesis Pronunciation-based Embeddings (2022)0.00