The Song Describer Dataset: A Corpus Of Audio Captions For Music-and-language Evaluation
2023 Β· Ilaria Manco, Benno Weck, Seungheon Doh, et al.
Abstract
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.
Authors
(none)
Tags
Stats
Related papers
- Crowdsourcing A Dataset Of Audio Captions (2019)8.60
- Audio Caption: Listen And Tell (2019)10.97
- S2cap: A Benchmark And A Baseline For Singing Style Captioning (2024)0.00
- Audiosetcaps: An Enriched Audio-caption Dataset Using Automated Generation Pipeline With Large Audio And Language Models (2024)13.44
- Auto-acd: A Large-scale Dataset For Audio-language Representation Learning (2023)10.74
- Musictm-dataset For Joint Representation Learning Among Sheet Music, Lyrics, And Musical Audio (2020)3.58
- Muscaps: Generating Captions For Music Audio (2021)9.59
- Sound-vecaps: Improving Audio Generation With Visual Enhanced Captions (2024)7.16