Text-to-audio Grounding Based Novel Metric For Evaluating Audio Caption Similarity
2022 Β· Swapnil Bhosale, Rupayan Chakraborty, Sunil Kumar Kopparapu
Abstract
Automatic Audio Captioning (AAC) refers to the task of translating an audio sample into a natural language (NL) text that describes the audio events, source of the events and their relationships. Unlike NL text generation tasks, which rely on metrics like BLEU, ROUGE, METEOR based on lexical semantics for evaluation, the AAC evaluation metric requires an ability to map NL text (phrases) that correspond to similar sounds in addition lexical semantics. Current metrics used for evaluation of AAC tasks lack an understanding of the perceived properties of sound represented by text. In this paper, wepropose a novel metric based on Text-to-Audio Grounding (TAG), which is, useful for evaluating cross modal tasks like AAC. Experiments on publicly available AAC data-set shows our evaluation metric to perform better compared to existing metrics used in NL text and image captioning literature.
Authors
(none)
Tags
Stats
Related papers
- Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-centric Performance Metrics (2022)0.00
- Beyond The Status Quo: A Contemporary Survey Of Advances And Challenges In Audio Captioning (2022)9.03
- Interactive Audio-text Representation For Automated Audio Captioning With Contrastive Learning (2022)0.00
- Wavetransformer: A Novel Architecture For Audio Captioning Based On Learning Temporal And Time-frequency Information (2020)0.00
- Evaluating Off-the-shelf Machine Listening And Natural Language Models For Automated Audio Captioning (2021)0.00
- CLAIR-A: Leveraging Large Language Models To Judge Audio Captions (2024)2.00
- Killing Two Birds With One Stone: Can An Audio Captioning System Also Be Used For Audio-text Retrieval? (2023)0.00
- Investigating Local And Global Information For Automated Audio Captioning With Transfer Learning (2021)0.00