Multilingual Speech Recognition With A Single End-to-end Model
2017 Β· Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, et al.
Abstract
Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input f
Authors
(none)
Tags
Stats
Related papers
- Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019)14.97
- Multilingual End-to-end Speech Recognition With A Single Transformer On Low-resource Languages (2018)0.00
- Multilingual Sequence-to-sequence Speech Recognition: Architecture, Transfer Learning, And Language Modeling (2018)13.84
- Multi-dialect Speech Recognition With A Single Sequence-to-sequence Model (2017)13.79
- A Two-stage Transliteration Approach To Improve Performance Of A Multilingual ASR (2024)0.00
- Towards One Model To Rule All: Multilingual Strategy For Dialectal Code-switching Arabic ASR (2021)9.03
- Massively Multilingual Adversarial Speech Recognition (2019)11.93
- Sequence-based Multi-lingual Low Resource Speech Recognition (2018)12.40