Attention And Localization Based On A Deep Convolutional Recurrent Model For Weakly Supervised Audio Tagging
2017 Β· Yong Xu, Qiuqiang Kong, Qiang Huang, et al.
Abstract
Audio tagging aims to perform multi-label classification on audio chunks and it is a newly proposed task in the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. This task encourages research efforts to better analyze and understand the content of the huge amounts of audio data on the web. The difficulty in audio tagging is that it only has a chunk-level label without a frame-level label. This paper presents a weakly supervised method to not only predict the tags but also indicate the temporal locations of the occurred acoustic events. The attention scheme is found to be effective in identifying the important frames while ignoring the unrelated frames. The proposed framework is a deep convolutional recurrent model with two auxiliary modules: an attention module and a localization module. The proposed algorithm was evaluated on the Task 4 of DCASE 2016 challenge. State-of-the-art performance was achieved on the evaluation set with equal error rate (
Authors
(none)
Tags
Stats
Related papers
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23
- A Light-weight Multimodal Framework For Improved Environmental Audio Tagging (2017)5.24
- Fully Dnn-based Multi-label Regression For Audio Tagging (2016)0.00
- Sample Mixed-based Data Augmentation For Domestic Audio Tagging (2018)0.00
- A Deep Neural Network For Audio Classification With A Classifier Attention Mechanism (2020)0.00
- Classifying Variable-length Audio Files With All-convolutional Networks And Masked Global Pooling (2016)0.00
- Combining High-level Features Of Raw Audio Waves And Mel-spectrograms For Audio Tagging (2018)0.00
- Impact Of Temporal Resolution On Convolutional Recurrent Networks For Audio Tagging And Sound Event Detection (2022)0.00