Improved Self-supervised Multilingual Speech Representation Learning Combined With Auxiliary Language Information
2022 Β· Fenglin Ding, Genshun Wan, Pengcheng Li, et al.
Abstract
Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer from language interference and further affect the application of multilingual system. In this paper, we introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information, including the language adversarial training, language embedding and language adaptive training during the pre-training stage. We conduct experiments on a multilingual ASR task consisting of 16 languages. Our experimental results demonstrate 14.3% relative gain over the standard XLSR model, and 19.8% relative gain over the no pre-training multilingual model.
Authors
(none)
Tags
Stats
Related papers
- Language Adaptive Cross-lingual Speech Representation Learning With Sparse Sharing Sub-networks (2022)8.35
- XLST: Cross-lingual Self-training To Learn Multilingual Representation For Low Resource Speech Recognition (2021)8.82
- Self-supervised Adaptive Pre-training Of Multilingual Speech Models For Language And Dialect Identification (2023)6.34
- Massively Multilingual Adversarial Speech Recognition (2019)11.93
- Unsupervised Cross-lingual Representation Learning For Speech Recognition (2020)18.91
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- Multilingual Speech Recognition Using Knowledge Transfer Across Learning Processes (2021)0.00
- Leveraging Multilingual Self-supervised Pretrained Models For Sequence-to-sequence End-to-end Spoken Language Understanding (2023)0.00