Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging
2017 Β· Yong Xu, Qiuqiang Kong, Qiang Huang, et al.
Abstract
Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development
Authors
(none)
Tags
Stats
Related papers
- Attention And Localization Based On A Deep Convolutional Recurrent Model For Weakly Supervised Audio Tagging (2017)11.39
- Combining High-level Features Of Raw Audio Waves And Mel-spectrograms For Audio Tagging (2018)0.00
- A Light-weight Multimodal Framework For Improved Environmental Audio Tagging (2017)5.24
- Sample Mixed-based Data Augmentation For Domestic Audio Tagging (2018)0.00
- Fully Dnn-based Multi-label Regression For Audio Tagging (2016)0.00
- Audio-based Music Classification With Densenet And Data Augmentation (2019)10.48
- Automatic Tagging Using Deep Convolutional Neural Networks (2016)0.00
- Classifying Variable-length Audio Files With All-convolutional Networks And Masked Global Pooling (2016)0.00