On The Landscape Of Spoken Language Models: A Comprehensive Survey
2025 Β· Siddhant Arora, Kai-Wei Chang, Chung-Ming Chien, et al.
Abstract
The field of spoken language processing is undergoing a shift from training custom-built, task-specific models toward using and optimizing spoken language models (SLMs) which act as universal speech processing systems. This trend is similar to the progression toward universal language models that has taken place in the field of (text) natural language processing. SLMs include both "pure" language models of speech -- models of the distribution of tokenized speech sequences -- and models that combine speech encoders with text language models, often including both spoken and written input or output. Work in this area is very diverse, with a range of terminology and evaluation settings. This paper aims to contribute an improved understanding of SLMs via a unifying literature survey of recent work in the context of the evolution of the field. Our survey categorizes the work in this area by model architecture, training, and evaluation choices, and describes some key challenges and directions
Authors
(none)
Tags
Stats
Related papers
- Recent Advances In Speech Language Models: A Survey (2024)14.64
- A Survey On Speech Large Language Models For Understanding (2024)4.52
- Towards Holistic Evaluation Of Large Audio-language Models: A Comprehensive Survey (2026)9.75
- Roadmap Towards Superhuman Speech Understanding Using Large Language Models (2024)0.00
- OSUM: Advancing Open Speech Understanding Models With Limited Resources In Academia (2025)0.00
- Towards Controllable Speech Synthesis In The Era Of Large Language Models: A Systematic Survey (2024)4.75
- A Study On The Integration Of Pre-trained SSL, ASR, LM And SLU Models For Spoken Language Understanding (2022)8.09
- Unislu: Unified Spoken Language Understanding From Heterogeneous Cross-task Datasets (2025)0.00