Continual Learning Optimizations For Auto-regressive Decoder Of Multilingual ASR Systems
2024 · Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng
Abstract
Continual Learning (CL) involves fine-tuning pre-trained models with new data while maintaining the performance on the pre-trained data. This is particularly relevant for expanding multilingual ASR (MASR) capabilities. However, existing CL methods, mainly designed for computer vision and reinforcement learning tasks, often yield sub-optimal results when directly applied to MASR. We hypothesise that this is because CL of the auto-regressive decoder in the MASR model is difficult. To verify this, we propose four optimizations on the decoder. They include decoder-layer gradient surgery, freezing unused token embeddings, suppressing output of newly added tokens, and learning rate re-scaling. Our experiments on adapting Whisper to 10 unseen languages from the Common Voice dataset demonstrate that these optimizations reduce the Average Word Error Rate (AWER) of pretrained languages from 14.2% to 12.4% compared with Experience Replay, without compromising the AWER of new languages.
Authors
(none)
Tags
Stats
Related papers
- Continual Learning For Monolingual End-to-end Automatic Speech Recognition (2021)7.16
- Rehearsal-free Online Continual Learning For Automatic Speech Recognition (2023)5.24
- Adapting Whisper For Code-switching Through Encoding Refining And Language-aware Decoding (2024)0.00
- Dual-pipeline With Low-rank Adaptation For New Language Integration In Multilingual ASR (2024)3.58
- Whisper-lm: Improving ASR Models With Language Models For Low-resource Languages (2025)3.29
- Continuously Learning New Words In Automatic Speech Recognition (2024)0.00
- Unsupervised Online Continual Learning For Automatic Speech Recognition (2024)4.52
- Weighted Cross-entropy For Low-resource Languages In Multilingual Speech Recognition (2024)6.34