Automated Audio Captioning: An Overview Of Recent Progress And New Challenges
2022 Β· Xinhao Mei, Xubo Liu, Mark D. Plumbley, et al.
Abstract
Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given audio clips. This task has received increasing attention with the release of freely available datasets in recent years. The problem has been addressed predominantly with deep learning techniques. Numerous approaches have been proposed, such as investigating different neural network architectures, exploiting auxiliary information such as keywords or sentence information to guide caption generation, and employing different training strategies, which have greatly facilitated the development of this field. In this paper, we present a comprehensive review of the published contributions in automated audio captioning, from a variety of existing approaches to evaluation metrics and datasets. We also discuss open challenges and envisage possible future research directions.
Authors
(none)
Tags
Stats
Related papers
- Beyond The Status Quo: A Contemporary Survey Of Advances And Challenges In Audio Captioning (2022)9.03
- An Encoder-decoder Based Audio Captioning System With Transfer And Reinforcement Learning (2021)0.00
- Crowdsourcing A Dataset Of Audio Captions (2019)8.60
- Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-centric Performance Metrics (2022)0.00
- Audio Caption: Listen And Tell (2019)10.97
- Improving The Performance Of Automated Audio Captioning Via Integrating The Acoustic And Semantic Information (2021)2.00
- Investigating Local And Global Information For Automated Audio Captioning With Transfer Learning (2021)0.00
- Evaluating Off-the-shelf Machine Listening And Natural Language Models For Automated Audio Captioning (2021)0.00