The Universal Personalizer: Few-shot Dysarthric Speech Recognition Via Meta-learning
2025 Β· Dhruuv Agarwal, Harry Zhang, Yang Yu, et al.
Abstract
Personalizing dysarthric ASR is hindered by demanding enrollment collection and per-user training. We propose a hybrid meta-training method for a single model, enabling zero-shot and few-shot on-the-fly personalization via in-context learning (ICL). On Euphonia, it achieves 13.9% Word Error Rate (WER), surpassing speaker-independent baselines (17.5%). On SAP Test-1, our 5.3% WER outperforms the challenge-winning team (5.97%). On Test-2, our 9.49% trails only the winner (8.11%) but without relying on techniques like offline model-merging or custom audio chunking. Curation yields a 40% WER reduction using random same-speaker examples, validating active personalization. While static text curation fails to beat this baseline, oracle similarity reveals substantial headroom, highlighting dynamic acoustic retrieval as the next frontier. Data ablations confirm rapid low-resource speaker adaptation, establishing the model as a practical personalized solution.
Authors
(none)
Tags
Stats
Related papers
- Meta-tts: Meta-learning For Few-shot Speaker Adaptive Text-to-speech (2021)12.74
- Efficient Personalized Speech Enhancement Through Self-supervised Learning (2021)10.21
- Fast Contextual Adaptation With Neural Associative Memory For On-device Personalized Speech Recognition (2021)9.76
- Contextual Adapters For Personalized Speech Recognition In Neural Transducers (2022)12.47
- Personalization For Bert-based Discriminative Speech Recognition Rescoring (2023)5.24
- Text Is All You Need: Personalizing ASR Models Using Controllable Speech Synthesis (2023)7.16
- Improved End-to-end Dysarthric Speech Recognition Via Meta-learning Based Model Re-initialization (2020)10.48
- Self-supervised Learning From Contrastive Mixtures For Personalized Speech Enhancement (2020)0.00