Multi-level And Multi-scale Feature Aggregation Using Pre-trained Convolutional Neural Networks For Music Auto-tagging
2017 Β· Jongpil Lee, Juhan Nam
Abstract
Music auto-tagging is often handled in a similar manner to image classification by regarding the 2D audio spectrogram as image data. However, music auto-tagging is distinguished from image classification in that the tags are highly diverse and have different levels of abstractions. Considering this issue, we propose a convolutional neural networks (CNN)-based architecture that embraces multi-level and multi-scaled features. The architecture is trained in three steps. First, we conduct supervised feature learning to capture local audio features using a set of CNNs with different input sizes. Second, we extract audio features from each layer of the pre-trained convolutional networks separately and aggregate them altogether given a long audio clip. Finally, we put them into fully-connected networks and make final predictions of the tags. Our experiments show that using the combination of multi-level and multi-scale features is highly effective in music auto-tagging and the proposed method
Authors
(none)
Tags
Stats
Related papers
- Sample-level CNN Architectures For Music Auto-tagging Using Raw Waveforms (2017)13.23
- Sample-level Deep Convolutional Neural Networks For Music Auto-tagging Using Raw Waveforms (2017)0.00
- Automatic Tagging Using Deep Convolutional Neural Networks (2016)0.00
- Convolutional Recurrent Neural Networks For Music Classification (2016)18.98
- How Low Can You Go? Reducing Frequency And Time Resolution In Current CNN Architectures For Music Auto-tagging (2019)4.52
- Audio-based Music Classification With Densenet And Data Augmentation (2019)10.48
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23
- Explaining Deep Convolutional Neural Networks On Music Classification (2016)0.00