Towards Holistic Evaluation Of Large Audio-language Models: A Comprehensive Survey
2026 Β· Chih-Kai Yang, Neo S. Ho, Hung-Yi Lee
Abstract
arXiv:2505.15957v4 Announce Type: replace-cross Abstract: With advancements in large audio-language models (LALMs), which enhance large language models (LLMs) with auditory capabilities, these models are expected to demonstrate universal proficiency across various auditory tasks. While numerous benchmarks have emerged to assess LALMs' performance, they remain fragmented and lack a structured taxonomy. To bridge this gap, we conduct a comprehensive survey and propose a systematic taxonomy for LALM evaluations, categorizing them into four dimensions based on their objectives: (1) General Auditory Awareness and Processing, (2) Knowledge and Reasoning, (3) Dialogue-oriented Ability, and (4) Fairness, Safety, and Trustworthiness. We provide detailed overviews within each category and highlight challenges in this field, offering insights into promising future directions. To the best of our knowledge, this is the first survey specifically focused on the evaluations of LALMs, providing clear
Authors
(none)
Tags
Stats
Related papers
- Audiobench: A Universal Benchmark For Audio Large Language Models (2024)10.21
- Measuring Audio's Impact On Correctness: Audio-contribution-aware Post-training Of Large Audio Language Models (2025)0.00
- All That Glitters Is Not Audio: Rethinking Text Priors And Audio Reliance In Audio-language Evaluation (2026)0.00
- A Survey On Speech Large Language Models For Understanding (2024)4.52
- Roadmap Towards Superhuman Speech Understanding Using Large Language Models (2024)0.00
- Audiotoolagent: An Agentic Framework For Audio-language Models (2025)2.60
- Au-m-ol: A Unified Model For Medical Audio And Language Understanding (2026)0.00
- Hearing To Translate: The Effectiveness Of Speech Modality Integration Into Llms (2026)0.00