Spidr-adapt: A Universal Speech Representation Model For Few-shot Adaptation
2025 Β· Mahi Luthra, Jiayi Shen, Maxime Poli, et al.
Abstract
Human infants, with only a few hundred hours of speech exposure, acquire basic units of new languages, highlighting a striking efficiency gap compared to the data-hungry self-supervised speech models. To address this gap, this paper introduces SpidR-Adapt for rapid adaptation of speech units to new languages using minimal unlabeled data. We cast such low-resource speech representation learning as a meta-learning problem and construct a multi-task adaptive pre-training (MAdaPT) protocol which formulates the adaptation process as a bi-level optimization framework. To enable scalable meta-training under this framework, we propose a novel heuristic solution, first-order bi-level optimization (FOBLO), avoiding heavy computation costs. Finally, we stabilize meta-training by using a robust initialization through interleaved supervision which alternates self-supervised and supervised objectives. Empirically, SpidR-Adapt achieves rapid gains in phonemic discriminability (ABX) and downstream spo
Authors
(none)
Tags
Stats
Related papers
- Residual Adapters For Parameter-efficient ASR Adaptation To Atypical And Accented Speech (2021)10.74
- SMILE: Speech Meta In-context Learning For Low-resource Language Automatic Speech Recognition (2024)0.00
- Resource-efficient Adaptation Of Speech Foundation Models For Multi-speaker ASR (2024)3.58
- SLM-TTA: A Framework For Test-time Adaptation Of Generative Spoken Language Models (2025)0.00
- Sample Efficient Adaptive Text-to-speech (2018)0.00
- Meta-tts: Meta-learning For Few-shot Speaker Adaptive Text-to-speech (2021)12.74
- Adapt-and-adjust: Overcoming The Long-tail Problem Of Multilingual Speech Recognition (2020)10.35
- How To Learn A New Language? An Efficient Solution For Self-supervised Learning Models Unseen Languages Adaption In Low-resource Scenario (2024)0.00