Artistmus: A Globally Diverse, Artist-centric Benchmark For Retrieval-augmented Music Question Answering
2025 · Daeyong Kwon, Seungheon Doh, Juhan Nam
Abstract
Recent advances in large language models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowledge in pretraining data. While music information retrieval and computational musicology have explored structured and multimodal understanding, few resources support factual and contextual music question answering (MQA) grounded in artist metadata or historical context. We introduce MusWikiDB, a vector database of 3.2M passages from 144K music-related Wikipedia pages, and ArtistMus, a benchmark of 1,000 questions on 500 diverse artists with metadata such as genre, debut year, and topic. These resources enable systematic evaluation of retrieval-augmented generation (RAG) for MQA. Experiments show that RAG markedly improves factual accuracy; open-source models gain up to +56.8 percentage points (for example, Qwen3 8B improves from 35.0 to 91.8), approaching proprietary model performance. RAG-style fine
Authors
(none)
Tags
Stats
Related papers
- Wikimute: A Web-sourced Dataset Of Semantic Descriptions For Music Audio (2023)5.24
- Incompebench: A Permissively Licensed, Fine-grained Benchmark For Music Information Retrieval (2026)0.00
- Enriching Music Descriptions With A Finetuned-llm And Metadata For Text-to-music Retrieval (2024)7.50
- Artseek: Deep Artwork Understanding Via Multimodal In-context Reasoning And Late Interaction Retrieval (2025)2.16
- Music4all A+A: A Multimodal Dataset For Music Information Retrieval Tasks (2025)0.95
- Murag: Multimodal Retrieval-augmented Generator For Open Question Answering Over Images And Text (2022)14.66
- Cross-modal Music Retrieval And Applications: An Overview Of Key Methodologies (2019)12.68
- Multimodal Metric Learning For Tag-based Music Retrieval (2020)9.76