Multi-modality In Music: Predicting Emotion In Music From High-level Audio Features And Lyrics
2023 Β· Tibor Krols, Yana Nikolova, Ninell Oldenburg
Abstract
This paper aims to test whether a multi-modal approach for music emotion recognition (MER) performs better than a uni-modal one on high-level song features and lyrics. We use 11 song features retrieved from the Spotify API, combined lyrics features including sentiment, TF-IDF, and Anew to predict valence and arousal (Russell, 1980) scores on the Deezer Mood Detection Dataset (DMDD) (Delbouys et al., 2018) with 4 different regression models. We find that out of the 11 high-level song features, mainly 5 contribute to the performance, multi-modal features do better than audio alone when predicting valence. We made our code publically available.
Authors
(none)
Tags
Stats
Related papers
- Music Mood Detection Based On Audio And Lyrics With Deep Neural Net (2018)0.00
- A Multimodal Approach Towards Emotion Recognition Of Music Using Audio And Lyrical Content (2018)0.00
- MERGE -- A Bimodal Audio-lyrics Dataset For Static Music Emotion Recognition (2024)0.00
- Expressivity-aware Music Performance Retrieval Using Mid-level Perceptual Features And Emotion Word Embeddings (2024)0.00
- ADFF: Attention Based Deep Feature Fusion Approach For Music Emotion Recognition (2022)0.00
- Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection (2019)0.00
- The Contribution Of Lyrics And Acoustics To Collaborative Understanding Of Mood (2022)2.26
- Semantic Matters: Multimodal Features For Affective Analysis (2025)0.00