Adaptive Activation Network For Low Resource Multilingual Speech Recognition
2022 Β· Jian Luo, Jianzong Wang, Ning Cheng, et al.
Abstract
Low resource automatic speech recognition (ASR) is a useful but thorny task, since deep learning ASR models usually need huge amounts of training data. The existing models mostly established a bottleneck (BN) layer by pre-training on a large source language, and transferring to the low resource target language. In this work, we introduced an adaptive activation network to the upper layers of ASR model, and applied different activation functions to different languages. We also proposed two approaches to train the model: (1) cross-lingual learning, replacing the activation function from source language to target language, (2) multilingual learning, jointly training the Connectionist Temporal Classification (CTC) loss of each language and the relevance of different languages. Our experiments on IARPA Babel datasets demonstrated that our approaches outperform the from-scratch training and traditional bottleneck feature based methods. In addition, combining the cross-lingual learning and mu
Authors
(none)
Tags
Stats
Related papers
- Sequence-based Multi-lingual Low Resource Speech Recognition (2018)12.40
- Multilingual Sequence-to-sequence Speech Recognition: Architecture, Transfer Learning, And Language Modeling (2018)13.84
- Learning Cross-lingual Mappings For Data Augmentation To Improve Low-resource Speech Recognition (2023)0.00
- Transfer Learning Of Language-independent End-to-end ASR With Language Model Fusion (2018)0.00
- Meta Learning For End-to-end Low-resource Speech Recognition (2019)0.00
- Multilingual Adaptation Of RNN Based ASR Systems (2017)7.50
- Master-asr: Achieving Multilingual Scalability And Low-resource Adaptation In ASR With Modular Learning (2023)0.00
- ASR Data Augmentation In Low-resource Settings Using Cross-lingual Multi-speaker TTS And Cross-lingual Voice Conversion (2022)6.77