Sample-level CNN Architectures For Music Auto-tagging Using Raw Waveforms
2017 Β· Taejun Kim, Jongpil Lee, Juhan Nam
Abstract
Recent work has shown that the end-to-end approach using convolutional neural network (CNN) is effective in various types of machine learning tasks. For audio signals, the approach takes raw waveforms as input using an 1-D convolution layer. In this paper, we improve the 1-D CNN architecture for music auto-tagging by adopting building blocks from state-of-the-art image classification models, ResNets and SENets, and adding multi-level feature aggregation to it. We compare different combinations of the modules in building CNN architectures. The results show that they achieve significant improvements over previous state-of-the-art models on the MagnaTagATune dataset and comparable results on Million Song Dataset. Furthermore, we analyze and visualize our model to show how the 1-D CNN operates.
Authors
(none)
Tags
Stats
Related papers
- Sample-level Deep Convolutional Neural Networks For Music Auto-tagging Using Raw Waveforms (2017)0.00
- Multi-level And Multi-scale Feature Aggregation Using Pre-trained Convolutional Neural Networks For Music Auto-tagging (2017)15.43
- How Low Can You Go? Reducing Frequency And Time Resolution In Current CNN Architectures For Music Auto-tagging (2019)4.52
- Automatic Tagging Using Deep Convolutional Neural Networks (2016)0.00
- Muslcat: Multi-scale Multi-level Convolutional Attention Transformer For Discriminative Music Modeling On Raw Waveforms (2021)0.00
- Convolutional Recurrent Neural Networks For Music Classification (2016)18.98
- Audio-based Music Classification With Densenet And Data Augmentation (2019)10.48
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23