Wikimute: A Web-sourced Dataset Of Semantic Descriptions For Music Audio
2023 Β· Benno Weck, Holger Kirchhoff, Peter Grosche, et al.
Abstract
Multi-modal deep learning techniques for matching free-form text with music have shown promising results in the field of Music Information Retrieval (MIR). Prior work is often based on large proprietary data while publicly available datasets are few and small in size. In this study, we present WikiMuTe, a new and open dataset containing rich semantic descriptions of music. The data is sourced from Wikipedia's rich catalogue of articles covering musical works. Using a dedicated text-mining pipeline, we extract both long and short-form descriptions covering a wide range of topics related to music content such as genre, style, mood, instrumentation, and tempo. To show the use of this data, we train a model that jointly learns text and audio representations and performs cross-modal retrieval. The model is evaluated on two tasks: tag-based music retrieval and music auto-tagging. The results show that while our approach has state-of-the-art performance on multiple tasks, but still observe a
Authors
(none)
Tags
Stats
Related papers
- Enriching Music Descriptions With A Finetuned-llm And Metadata For Text-to-music Retrieval (2024)7.50
- Musictm-dataset For Joint Representation Learning Among Sheet Music, Lyrics, And Musical Audio (2020)3.58
- Artistmus: A Globally Diverse, Artist-centric Benchmark For Retrieval-augmented Music Question Answering (2025)0.00
- Multimodal Metric Learning For Tag-based Music Retrieval (2020)9.76
- Incompebench: A Permissively Licensed, Fine-grained Benchmark For Music Information Retrieval (2026)0.00
- Music4all A+A: A Multimodal Dataset For Music Information Retrieval Tasks (2025)0.95
- Towards Robust And Truly Large-scale Audio-sheet Music Retrieval (2023)4.52
- Exploring Modality-agnostic Representations For Music Classification (2021)0.00