Multilingual End-to-end Speech Recognition With A Single Transformer On Low-resource Languages
2018 Β· Shiyu Zhou, Shuang Xu, Bo Xu
Abstract
Sequence-to-sequence attention-based models integrate an acoustic, pronunciation and language model into a single neural network, which make them very suitable for multilingual automatic speech recognition (ASR). In this paper, we are concerned with multilingual speech recognition on low-resource languages by a single Transformer, one of sequence-to-sequence attention-based models. Sub-words are employed as the multilingual modeling unit without using any pronunciation lexicon. First, we show that a single multilingual ASR Transformer performs well on low-resource languages despite of some language confusion. We then look at incorporating language information into the model by inserting the language symbol at the beginning or at the end of the original sub-words sequence under the condition of language information being known during training. Experiments on CALLHOME datasets demonstrate that the multilingual ASR Transformer with the language symbol at the end performs better and can ob
Authors
(none)
Tags
Stats
Related papers
- Multilingual Speech Recognition With A Single End-to-end Model (2017)16.05
- Sequence-based Multi-lingual Low Resource Speech Recognition (2018)12.40
- Multilingual Sequence-to-sequence Speech Recognition: Architecture, Transfer Learning, And Language Modeling (2018)13.84
- Transformer-transducers For Code-switched Speech Recognition (2020)10.97
- Multitask Learning And Joint Optimization For Transformer-rnn-transducer Speech Recognition (2020)8.09
- Fast Offline Transformer-based End-to-end Automatic Speech Recognition For Real-world Applications (2021)7.16
- Adaptive Activation Network For Low Resource Multilingual Speech Recognition (2022)0.00
- Transfer Learning Of Language-independent End-to-end ASR With Language Model Fusion (2018)0.00