Expressivity-aware Music Performance Retrieval Using Mid-level Perceptual Features And Emotion Word Embeddings | Awesome Similarity Search Papers

Expressivity-aware Music Performance Retrieval Using Mid-level Perceptual Features And Emotion Word Embeddings

Shreyan Chowdhury, Gerhard Widmer · FIRE 2023: Forum for Information Retrieval Evaluation · 2024

This paper explores a specific sub-task of cross-modal music retrieval. We consider the delicate task of retrieving a performance or rendition of a musical piece based on a description of its style, expressive character, or emotion from a set of different performances of the same piece. We observe that a general purpose cross-modal system trained to learn a common text-audio embedding space does not yield optimal results for this task. By introducing two changes – one each to the text encoder and the audio encoder – we demonstrate improved performance on a dataset of piano performances and associated free-text descriptions. On the text side, we use emotion-enriched word embeddings (EWE) and on the audio side, we extract mid-level perceptual features instead of generic audio embeddings. Our results highlight the effectiveness of mid-level perceptual features learnt from music and emotion enriched word embeddings learnt from emotion-labelled text in capturing musical expression in a cross-modal setting. Additionally, our interpretable mid-level features provide a route for introducing explainability in the retrieval and downstream recommendation processes.

Explore more on:
Uncategorized
Similar Work
Loading…