ADFF: Attention Based Deep Feature Fusion Approach For Music Emotion Recognition
2022 Β· Zi Huang, Shulei Ji, Zhilan Hu, et al.
Abstract
Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show that our proposed method achieves 10.43% and 4.82% relative improvement of valence and arousal respect
Authors
(none)
Tags
Stats
Related papers
- Audio-guided Fusion Techniques For Multimodal Emotion Analysis (2024)4.52
- Multi-modality In Music: Predicting Emotion In Music From High-level Audio Features And Lyrics (2023)0.00
- Enhancing Modal Fusion By Alignment And Label Matching For Multimodal Emotion Recognition (2024)6.34
- MMER: Multimodal Multi-task Learning For Speech Emotion Recognition (2022)10.07
- A Multimodal Approach Towards Emotion Recognition Of Music Using Audio And Lyrical Content (2018)0.00
- MF-AED-AEC: Speech Emotion Recognition By Leveraging Multimodal Fusion, Asr Error Detection, And Asr Error Correction (2024)0.00
- MERGE -- A Bimodal Audio-lyrics Dataset For Static Music Emotion Recognition (2024)0.00
- Music Mood Detection Based On Audio And Lyrics With Deep Neural Net (2018)0.00