Stacked Convolutional And Recurrent Neural Networks For Music Emotion Recognition
2017 Β· Miroslav Malik, Sharath Adavanne, Konstantinos Drossos, et al.
Abstract
This paper studies the emotion recognition from musical tracks in the 2-dimensional valence-arousal (V-A) emotional space. We propose a method based on convolutional (CNN) and recurrent neural networks (RNN), having significantly fewer parameters compared with the state-of-the-art method for the same task. We utilize one CNN layer followed by two branches of RNNs trained separately for arousal and valence. The method was evaluated using the 'MediaEval2015 emotion in music' dataset. We achieved an RMSE of 0.202 for arousal and 0.268 for valence, which is the best result reported on this dataset.
Authors
(none)
Tags
Stats
Related papers
- A Multimodal Approach Towards Emotion Recognition Of Music Using Audio And Lyrical Content (2018)0.00
- Emotion Recognition From Speech (2019)0.00
- Convolutional Recurrent Neural Networks For Music Classification (2016)18.98
- Multimodal Fusion With Deep Neural Networks For Audio-video Emotion Recognition (2019)0.00
- Emotion Recognition System From Speech And Visual Information Based On Convolutional Neural Networks (2020)10.21
- MMVA: Multimodal Matching Based On Valence And Arousal Across Images, Music, And Musical Captions (2025)0.00
- Multi-modality In Music: Predicting Emotion In Music From High-level Audio Features And Lyrics (2023)0.00
- ADFF: Attention Based Deep Feature Fusion Approach For Music Emotion Recognition (2022)0.00