Aqua-bench: Beyond Finding Answers To Knowing When There Are None In Audio Question Answering
2026 Β· Chun-Yi Kuan, Hung-Yi Lee
Abstract
arXiv:2601.12248v2 Announce Type: replace-cross Abstract: Recent advances in audio-aware large language models have shown strong performance on audio question answering. However, existing benchmarks mainly cover answerable questions and overlook the challenge of unanswerable ones, where no reliable answer can be inferred from the audio. Such cases are common in real-world settings, where questions may be misleading, ill-posed, or incompatible with the information. To address this gap, we present AQUA-Bench, a benchmark for Audio Question Unanswerability Assessment. It systematically evaluates three scenarios: Absent Answer Detection (the correct option is missing), Incompatible Answer Set Detection (choices are categorically mismatched with the question), and Incompatible Audio Question Detection (the question is irrelevant or lacks sufficient grounding in the audio). By assessing these cases, AQUA-Bench offers a rigorous measure of model reliability and promotes the development of au
Authors
(none)
Tags
Stats
Related papers
- All That Glitters Is Not Audio: Rethinking Text Priors And Audio Reliance In Audio-language Evaluation (2026)0.00
- Audiobench: A Universal Benchmark For Audio Large Language Models (2024)10.21
- Walking Through Uncertainty: An Empirical Study Of Uncertainty Estimation For Audio-aware Large Language Models (2026)0.00
- Measuring Audio's Impact On Correctness: Audio-contribution-aware Post-training Of Large Audio Language Models (2025)0.00
- ASK: Adaptive Self-improving Knowledge Framework For Audio Text Retrieval (2025)0.00
- Towards Holistic Evaluation Of Large Audio-language Models: A Comprehensive Survey (2026)9.75
- Exploring Audio Hallucination In Egocentric Video Understanding (2026)0.00
- Audiotoolagent: An Agentic Framework For Audio-language Models (2025)2.60