Improving The Performance Of Automated Audio Captioning Via Integrating The Acoustic And Semantic Information
2021 Β· Zhongjie Ye, Helin Wang, Dongchao Yang, et al.
Abstract
Automated audio captioning (AAC) has developed rapidly in recent years, involving acoustic signal processing and natural language processing to generate human-readable sentences for audio clips. The current models are generally based on the neural encoder-decoder architecture, and their decoder mainly uses acoustic information that is extracted from the CNN-based encoder. However, they have ignored semantic information that could help the AAC model to generate meaningful descriptions. This paper proposes a novel approach for automated audio captioning based on incorporating semantic and acoustic information. Specifically, our audio captioning model consists of two sub-modules. (1) The pre-trained keyword encoder utilizes pre-trained ResNet38 to initialize its parameters, and then it is trained by extracted keywords as labels. (2) The multi-modal attention decoder adopts an LSTM-based decoder that contains semantic and acoustic attention modules. Experiments demonstrate that our propose
Authors
(none)
Tags
Stats
Related papers
- Improving Audio Captioning Models With Fine-grained Audio Features, Text Embedding Supervision, And LLM Mix-up Augmentation (2023)8.82
- Beyond The Status Quo: A Contemporary Survey Of Advances And Challenges In Audio Captioning (2022)9.03
- Enhancing Automated Audio Captioning Via Large Language Models With Optimized Audio Encoding (2024)5.24
- Automated Audio Captioning Via Fusion Of Low- And High- Dimensional Features (2022)0.00
- Automatic Audio Captioning Using Attention Weighted Event Based Embeddings (2022)0.00
- Automated Audio Captioning: An Overview Of Recent Progress And New Challenges (2022)12.10
- Evaluating Off-the-shelf Machine Listening And Natural Language Models For Automated Audio Captioning (2021)0.00
- Investigating Local And Global Information For Automated Audio Captioning With Transfer Learning (2021)0.00