Expressivity-aware Music Performance Retrieval Using Mid-level Perceptual Features And Emotion Word Embeddings
2024 Β· Shreyan Chowdhury, Gerhard Widmer
Abstract
This paper explores a specific sub-task of cross-modal music retrieval. We consider the delicate task of retrieving a performance or rendition of a musical piece based on a description of its style, expressive character, or emotion from a set of different performances of the same piece. We observe that a general purpose cross-modal system trained to learn a common text-audio embedding space does not yield optimal results for this task. By introducing two changes -- one each to the text encoder and the audio encoder -- we demonstrate improved performance on a dataset of piano performances and associated free-text descriptions. On the text side, we use emotion-enriched word embeddings (EWE) and on the audio side, we extract mid-level perceptual features instead of generic audio embeddings. Our results highlight the effectiveness of mid-level perceptual features learnt from music and emotion enriched word embeddings learnt from emotion-labelled text in capturing musical expression in a cr
Authors
(none)
Tags
Stats
Related papers
- Multi-modality In Music: Predicting Emotion In Music From High-level Audio Features And Lyrics (2023)0.00
- ADFF: Attention Based Deep Feature Fusion Approach For Music Emotion Recognition (2022)0.00
- A Multimodal Approach Towards Emotion Recognition Of Music Using Audio And Lyrical Content (2018)0.00
- Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection (2019)0.00
- Disentangling Score Content And Performance Style For Joint Piano Rendering And Transcription (2025)0.00
- Semantic Matters: Multimodal Features For Affective Analysis (2025)0.00
- Music Mood Detection Based On Audio And Lyrics With Deep Neural Net (2018)0.00
- Fusion Approaches For Emotion Recognition From Speech Using Acoustic And Text-based Features (2024)12.25