The NTT DCASE2020 Challenge Task 6 System: Automated Audio Captioning With Keywords And Sentence Length Estimation
2020 Β· Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, et al.
Abstract
This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning. Our submission focuses on solving two indeterminacy problems in automated audio captioning: word selection indeterminacy and sentence length indeterminacy. We simultaneously solve the main caption generation and sub indeterminacy problems by estimating keywords and sentence length through multi-task learning. We tested a simplified model of our submission using the development-testing dataset. Our model achieved 20.7 SPIDEr score where that of the baseline system was 5.4.
Authors
(none)
Tags
Stats
Related papers
- Automated Audio Captioning And Language-based Audio Retrieval (2022)0.00
- An Encoder-decoder Based Audio Captioning System With Transfer And Reinforcement Learning (2021)0.00
- Multitask Learning In Audio Captioning: A Sentence Embedding Regression Loss Acts As A Regularizer (2023)3.58
- Performance Improvement Of Language-queried Audio Source Separation Based On Caption Augmentation From Large Language Models For DCASE Challenge 2024 Task 9 (2024)0.00
- Audio-visual Scene Classification: Analysis Of DCASE 2021 Challenge Submissions (2021)0.00
- Automated Audio Captioning: An Overview Of Recent Progress And New Challenges (2022)12.10
- Temporal Sub-sampling Of Audio Feature Sequences For Automated Audio Captioning (2020)0.00
- Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-centric Performance Metrics (2022)0.00