Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-centric Performance Metrics
2022 Β· Sandeep Kothinti, Dimitra Emmanouilidou
Abstract
The analysis, processing, and extraction of meaningful information from sounds all around us is the subject of the broader area of audio analytics. Audio captioning is a recent addition to the domain of audio analytics, a cross-modal translation task that focuses on generating natural descriptions from sound events occurring in an audio stream. In this work, we identify and improve on three main challenges in automated audio captioning: i) data scarcity, ii) imbalance or limitations in the audio captions vocabulary, and iii) the proper performance evaluation metric that can best capture both auditory and semantic characteristics. We find that generally adopted loss functions can result in an unfair vocabulary imbalance during model training. We propose two audio captioning augmentation methods that enrich the training dataset and the vocabulary size. We further underline the need for in-domain pretraining by exploring the suitability of audio encoders that were previously trained on di
Authors
(none)
Tags
Stats
Related papers
- Automated Audio Captioning: An Overview Of Recent Progress And New Challenges (2022)12.10
- Beyond The Status Quo: A Contemporary Survey Of Advances And Challenges In Audio Captioning (2022)9.03
- Improving Audio Captioning Models With Fine-grained Audio Features, Text Embedding Supervision, And LLM Mix-up Augmentation (2023)8.82
- Investigating Local And Global Information For Automated Audio Captioning With Transfer Learning (2021)0.00
- An Encoder-decoder Based Audio Captioning System With Transfer And Reinforcement Learning (2021)0.00
- Audio Caption: Listen And Tell (2019)10.97
- Enhancing Automated Audio Captioning Via Large Language Models With Optimized Audio Encoding (2024)5.24
- Evaluating Off-the-shelf Machine Listening And Natural Language Models For Automated Audio Captioning (2021)0.00