A Study Of Multilingual End-to-end Speech Recognition For Kazakh, Russian, And English
2021 Β· Saida Mussakhojayeva, Yerbolat Khassanov, Huseyin Atakan Varol
Abstract
We study training a single end-to-end (E2E) automatic speech recognition (ASR) model for three languages used in Kazakhstan: Kazakh, Russian, and English. We first describe the development of multilingual E2E ASR based on Transformer networks and then perform an extensive assessment on the aforementioned languages. We also compare two variants of output grapheme set construction: combined and independent. Furthermore, we evaluate the impact of LMs and data augmentation techniques on the recognition performance of the multilingual E2E ASR. In addition, we present several datasets for training and evaluation purposes. Experiment results show that the multilingual models achieve comparable performances to the monolingual baselines with a similar number of parameters. Our best monolingual and multilingual models achieved 20.9% and 20.5% average word error rates on the combined test set, respectively. To ensure the reproducibility of our experiments and results, we share our training recipe
Authors
(none)
Tags
Stats
Related papers
- Multilingual Speech Recognition With A Single End-to-end Model (2017)16.05
- Large-scale Multilingual Speech Recognition With A Streaming End-to-end Model (2019)14.97
- A Two-stage Transliteration Approach To Improve Performance Of A Multilingual ASR (2024)0.00
- Multilingual Speech Recognition Using Knowledge Transfer Across Learning Processes (2021)0.00
- Towards One Model To Rule All: Multilingual Strategy For Dialectal Code-switching Arabic ASR (2021)9.03
- Multilingual End-to-end Speech Recognition With A Single Transformer On Low-resource Languages (2018)0.00
- End-to-end ASR For Code-switched Hindi-english Speech (2019)0.00
- A Comparative Study On Neural Architectures And Training Methods For Japanese Speech Recognition (2021)7.50