Classifying Variable-length Audio Files With All-convolutional Networks And Masked Global Pooling
2016 Β· Lars Hertel, Huy Phan, Alfred Mertins
Abstract
We trained a deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi-label classification for domestic audio tagging in the DCASE-2016 contest. Our network achieved an average accuracy of 84.5% on the four-fold cross-validation for acoustic scene recognition, compared to the provided baseline of 72.5%, and an average equal error rate of 0.17 for domestic audio tagging, compared to the baseline of 0.21. The network therefore improves the baselines by a relative amount of 17% and 19%, respectively. The network only consists of convolutional layers to extract features from the short-time Fourier transform and one global pooling layer to combine those features. It particularly possesses neither fully-connected layers, besides the fully-connected output layer, nor dropout layers.
Authors
(none)
Tags
Stats
Related papers
- Acoustic Scene Classification Using Multi-layer Temporal Pooling Based On Convolutional Neural Network (2019)0.00
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23
- Acoustic Scene Classification Using Bilinear Pooling On Time-liked And Frequency-liked Convolution Neural Network (2020)5.84
- Acoustic Scene Classification Using Convolutional Neural Network And Multiple-width Frequency-delta Data Augmentation (2016)0.00
- Attention And Localization Based On A Deep Convolutional Recurrent Model For Weakly Supervised Audio Tagging (2017)11.39
- Audio Scene Classification With Deep Recurrent Neural Networks (2017)11.29
- A Deep Neural Network For Audio Classification With A Classifier Attention Mechanism (2020)0.00
- Combining High-level Features Of Raw Audio Waves And Mel-spectrograms For Audio Tagging (2018)0.00