LESS: Large Language Model Enhanced Semi-supervised Learning For Speech Foundational Models Using In-the-wild Data
2025 Β· Wen Ding, Fan Qian
Abstract
Although state-of-the-art Speech Foundational Models can produce high-quality text pseudo-labels, applying Semi-Supervised Learning (SSL) for in-the-wild real-world data remains challenging due to its richer and more complex acoustics compared to curated datasets. To address the challenges, we introduce LESS (Large Language Model Enhanced Semi-supervised Learning), a versatile framework that uses Large Language Models (LLMs) to correct pseudo-labels generated on in-the-wild data. In the LESS framework, pseudo-labeled text from Automatic Speech Recognition (ASR) or Automatic Speech Translation (AST) of the unsupervised data is refined by an LLM, and further improved by a data filtering strategy. Across Mandarin ASR and Spanish-to-English AST evaluations, LESS delivers consistent gains, with an absolute Word Error Rate reduction of 3.8% on WenetSpeech, and BLEU score increase of 0.8 and 0.7, achieving 34.0 on Callhome and 64.7 on Fisher testsets respectively. These results highlight LESS
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Active Learning: Optimizing Labeling Cost-effectiveness For Automatic Speech Recognition (2023)0.00
- Exploring Fine-tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data (2025)0.00
- Large Language Model Guided Decoding For Self-supervised Speech Recognition (2025)0.00
- Towards Supervised Performance On Speaker Verification With Self-supervised Learning By Leveraging Large-scale ASR Models (2024)7.50
- Zero-resource Speech Translation And Recognition With Llms (2024)3.58
- Lebenchmark: A Reproducible Framework For Assessing Self-supervised Representation Learning From Speech (2021)11.39
- LASER: Learning By Aligning Self-supervised Representations Of Speech For Improving Content-related Tasks (2024)4.52
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35