Improving Noisy Student Training For Low-resource Languages In End-to-end ASR Using Cyclegan And Inter-domain Losses
2024 Β· Chia-Yu Li, Ngoc Thang Vu
Abstract
Training a semi-supervised end-to-end speech recognition system using noisy student training has significantly improved performance. However, this approach requires a substantial amount of paired speech-text and unlabeled speech, which is costly for low-resource languages. Therefore, this paper considers a more extreme case of semi-supervised end-to-end automatic speech recognition where there are limited paired speech-text, unlabeled speech (less than five hours), and abundant external text. Firstly, we observe improved performance by training the model using our previous work on semi-supervised learning "CycleGAN and inter-domain losses" solely with external text. Secondly, we enhance "CycleGAN and inter-domain losses" by incorporating automatic hyperparameter tuning, calling it "enhanced CycleGAN inter-domain losses." Thirdly, we integrate it into the noisy student training approach pipeline for low-resource scenarios. Our experimental results, conducted on six non-English languages
Authors
(none)
Tags
Stats
Related papers
- Improving Semi-supervised End-to-end Automatic Speech Recognition Using Cyclegan And Inter-domain Losses (2022)3.58
- Speech Enhancement Based On Cyclegan With Noise-informed Training (2021)5.84
- Pretraining By Backtranslation For End-to-end ASR In Low-resource Settings (2018)0.00
- Semi-supervised Sequence-to-sequence ASR Using Unpaired Speech And Text (2019)0.00
- Generative Adversarial Training Data Adaptation For Very Low-resource Automatic Speech Recognition (2020)6.77
- Semi-supervised Training For Improving Data Efficiency In End-to-end Speech Synthesis (2018)13.28
- A Multi-discriminator Cyclegan For Unsupervised Non-parallel Speech Domain Adaptation (2018)9.76
- Data Augmentation Methods For End-to-end Speech Recognition On Distant-talk Scenarios (2021)6.34