Music Mood Detection Based On Audio And Lyrics With Deep Neural Net
2018 Β· RΓ©mi Delbouys, Romain Hennequin, Francesco Piccoli, et al.
Abstract
We consider the task of multimodal music mood prediction based on the audio signal and the lyrics of a track. We reproduce the implementation of traditional feature engineering based approaches and propose a new model based on deep learning. We compare the performance of both approaches on a database containing 18,000 tracks with associated valence and arousal values and show that our approach outperforms classical models on the arousal detection task, and that both approaches perform equally on the valence prediction task. We also compare the a posteriori fusion with fusion of modalities optimized simultaneously with each unimodal model, and observe a significant improvement of valence prediction. We release part of our database for comparison purposes.
Authors
(none)
Tags
Stats
Related papers
- A Multimodal Approach Towards Emotion Recognition Of Music Using Audio And Lyrical Content (2018)0.00
- Multi-modality In Music: Predicting Emotion In Music From High-level Audio Features And Lyrics (2023)0.00
- Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection (2019)0.00
- The Contribution Of Lyrics And Acoustics To Collaborative Understanding Of Mood (2022)2.26
- Multimodal Fusion With Deep Neural Networks For Audio-video Emotion Recognition (2019)0.00
- ADFF: Attention Based Deep Feature Fusion Approach For Music Emotion Recognition (2022)0.00
- MERGE -- A Bimodal Audio-lyrics Dataset For Static Music Emotion Recognition (2024)0.00
- Who Will Top The Charts? Multimodal Music Popularity Prediction Via Adaptive Fusion Of Modality Experts And Temporal Engagement Modeling (2025)0.00