Crowdsourcing A Dataset Of Audio Captions
2019 Β· Samuel Lipping, Konstantinos Drossos, Tuomas Virtanen
Abstract
Audio captioning is a novel field of multi-modal translation and it is the task of creating a textual description of the content of an audio signal (e.g. "people talking in a big room"). The creation of a dataset for this task requires a considerable amount of work, rendering the crowdsourcing a very attractive option. In this paper we present a three steps based framework for crowdsourcing an audio captioning dataset, based on concepts and practises followed for the creation of widely used image captioning and machine translations datasets. During the first step initial captions are gathered. A grammatically corrected and/or rephrased version of each initial caption is obtained in second step. Finally, the initial and edited captions are rated, keeping the top ones for the produced dataset. We objectively evaluate the impact of our framework during the process of creating an audio captioning dataset, in terms of diversity and amount of typographical errors in the obtained captions. Th
Authors
(none)
Tags
Stats
Related papers
- Automated Audio Captioning: An Overview Of Recent Progress And New Challenges (2022)12.10
- Clotho: An Audio Captioning Dataset (2019)17.70
- Audio Caption: Listen And Tell (2019)10.97
- Audiosetcaps: An Enriched Audio-caption Dataset Using Automated Generation Pipeline With Large Audio And Language Models (2024)13.44
- Auto-acd: A Large-scale Dataset For Audio-language Representation Learning (2023)10.74
- Crowdspeech And Voxdiy: Benchmark Datasets For Crowdsourced Audio Transcription (2021)0.00
- Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-centric Performance Metrics (2022)0.00
- Wavcaps: A Chatgpt-assisted Weakly-labelled Audio Captioning Dataset For Audio-language Multimodal Research (2023)20.69