An Encoder-decoder Based Audio Captioning System With Transfer And Reinforcement Learning
2021 Β· Xinhao Mei, Qiushi Huang, Xubo Liu, et al.
Abstract
Automated audio captioning aims to use natural language to describe the content of audio data. This paper presents an audio captioning system with an encoder-decoder architecture, where the decoder predicts words based on audio features extracted by the encoder. To improve the proposed system, transfer learning from either an upstream audio-related task or a large in-domain dataset is introduced to mitigate the problem induced by data scarcity. Besides, evaluation metrics are incorporated into the optimization of the model with reinforcement learning, which helps address the problem of ``exposure bias'' induced by ``teacher forcing'' training strategy and the mismatch between the evaluation metrics and the loss function. The resulting system was ranked 3rd in DCASE 2021 Task 6. Ablation studies are carried out to investigate how much each element in the proposed system can contribute to final performance. The results show that the proposed techniques significantly improve the scores of
Authors
(none)
Tags
Stats
Related papers
- Automated Audio Captioning: An Overview Of Recent Progress And New Challenges (2022)12.10
- Enhancing Automated Audio Captioning Via Large Language Models With Optimized Audio Encoding (2024)5.24
- Listen Carefully And Tell: An Audio Captioning System Based On Residual Learning And Gammatone Audio Representation (2020)0.00
- Automated Audio Captioning With Recurrent Neural Networks (2017)13.97
- Beyond The Status Quo: A Contemporary Survey Of Advances And Challenges In Audio Captioning (2022)9.03
- Improving Audio Captioning Models With Fine-grained Audio Features, Text Embedding Supervision, And LLM Mix-up Augmentation (2023)8.82
- Investigating Local And Global Information For Automated Audio Captioning With Transfer Learning (2021)0.00
- Conette: An Efficient Audio Captioning System Leveraging Multiple Datasets With Task Embedding (2023)11.11