Automated Audio Captioning Via Fusion Of Low- And High- Dimensional Features
2022 Β· Jianyuan Sun, Xubo Liu, Xinhao Mei, et al.
Abstract
Automated audio captioning (AAC) aims to describe the content of an audio clip using simple sentences. Existing AAC methods are developed based on an encoder-decoder architecture that success is attributed to the use of a pre-trained CNN10 called PANNs as the encoder to learn rich audio representations. AAC is a highly challenging task due to its high-dimensional talent space involves audio of various scenarios. Existing methods only use the high-dimensional representation of the PANNs as the input of the decoder. However, the low-dimension representation may retain as much audio information as the high-dimensional representation may be neglected. In addition, although the high-dimensional approach may predict the audio captions by learning from existing audio captions, which lacks robustness and efficiency. To deal with these challenges, a fusion model which integrates low- and high-dimensional features AAC framework is proposed. In this paper, a new encoder-decoder framework is propo
Authors
(none)
Tags
Stats
Related papers
- Dual Transformer Decoder Based Features Fusion Network For Automated Audio Captioning (2023)4.52
- Improving The Performance Of Automated Audio Captioning Via Integrating The Acoustic And Semantic Information (2021)2.00
- Enhancing Automated Audio Captioning Via Large Language Models With Optimized Audio Encoding (2024)5.24
- Beyond The Status Quo: A Contemporary Survey Of Advances And Challenges In Audio Captioning (2022)9.03
- Conette: An Efficient Audio Captioning System Leveraging Multiple Datasets With Task Embedding (2023)11.11
- Wavetransformer: A Novel Architecture For Audio Captioning Based On Learning Temporal And Time-frequency Information (2020)0.00
- Improving Audio Captioning Models With Fine-grained Audio Features, Text Embedding Supervision, And LLM Mix-up Augmentation (2023)8.82
- Automatic Audio Captioning Using Attention Weighted Event Based Embeddings (2022)0.00