Combining High-level Features Of Raw Audio Waves And Mel-spectrograms For Audio Tagging
2018 Β· Marcel Lederle, Benjamin Wilhelm
Abstract
In this paper, we describe our contribution to Task 2 of the DCASE 2018 Audio Challenge. While it has become ubiquitous to utilize an ensemble of machine learning methods for classification tasks to obtain better predictive performance, the majority of ensemble methods combine predictions rather than learned features. We propose a single-model method that combines learned high-level features computed from log-scaled mel-spectrograms and raw audio data. These features are learned separately by two Convolutional Neural Networks, one for each input type, and then combined by densely connected layers within a single network. This relatively simple approach along with data augmentation ranks among the best two percent in the Freesound General-Purpose Audio Tagging Challenge on Kaggle.
Authors
(none)
Tags
Stats
Related papers
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23
- Sample Mixed-based Data Augmentation For Domestic Audio Tagging (2018)0.00
- Sample-level CNN Architectures For Music Auto-tagging Using Raw Waveforms (2017)13.23
- Multi-level And Multi-scale Feature Aggregation Using Pre-trained Convolutional Neural Networks For Music Auto-tagging (2017)15.43
- Attention And Localization Based On A Deep Convolutional Recurrent Model For Weakly Supervised Audio Tagging (2017)11.39
- Sample-level Deep Convolutional Neural Networks For Music Auto-tagging Using Raw Waveforms (2017)0.00
- Utilizing Domain Knowledge In End-to-end Audio Processing (2017)0.00
- Audio Classification Of Low Feature Spectrograms Utilizing Convolutional Neural Networks (2024)5.84