MERGE -- A Bimodal Audio-lyrics Dataset For Static Music Emotion Recognition
2024 Β· Pedro Lima Louro, Hugo Redinho, Ricardo Santos, et al.
Abstract
The Music Emotion Recognition (MER) field has seen steady developments in recent years, with contributions from feature engineering, machine learning, and deep learning. The landscape has also shifted from audio-centric systems to bimodal ensembles that combine audio and lyrics. However, a lack of public, sizable and quality-controlled bimodal databases has hampered the development and improvement of bimodal audio-lyrics systems. This article proposes three new audio, lyrics, and bimodal MER research datasets, collectively referred to as MERGE, which were created using a semi-automatic approach. To comprehensively assess the proposed datasets and establish a baseline for benchmarking, we conducted several experiments for each modality, using feature engineering, machine learning, and deep learning methodologies. Additionally, we propose and validate fixed train-validation-test splits. The obtained results confirm the viability of the proposed datasets, achieving the best overall result
Authors
(none)
Tags
Stats
Related papers
- Multi-modality In Music: Predicting Emotion In Music From High-level Audio Features And Lyrics (2023)0.00
- Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection (2019)0.00
- ADFF: Attention Based Deep Feature Fusion Approach For Music Emotion Recognition (2022)0.00
- A Multimodal Approach Towards Emotion Recognition Of Music Using Audio And Lyrical Content (2018)0.00
- Music Mood Detection Based On Audio And Lyrics With Deep Neural Net (2018)0.00
- Musictm-dataset For Joint Representation Learning Among Sheet Music, Lyrics, And Musical Audio (2020)3.58
- MMER: Multimodal Multi-task Learning For Speech Emotion Recognition (2022)10.07
- Audio-guided Fusion Techniques For Multimodal Emotion Analysis (2024)4.52