Contrastive Learning For Cross-modal Artist Retrieval
2023 Β· Andres Ferraro, Jaehun Kim, Sergio Oramas, et al.
Abstract
Music retrieval and recommendation applications often rely on content features encoded as embeddings, which provide vector representations of items in a music dataset. Numerous complementary embeddings can be derived from processing items originally represented in several modalities, e.g., audio signals, user interaction data, or editorial data. However, data of any given modality might not be available for all items in any music dataset. In this work, we propose a method based on contrastive learning to combine embeddings from multiple modalities and explore the impact of the presence or absence of embeddings from diverse modalities in an artist similarity task. Experiments on two datasets suggest that our contrastive method outperforms single-modality embeddings and baseline algorithms for combining modalities, both in terms of artist retrieval accuracy and coverage. Improvements with respect to other methods are particularly significant for less popular query artists. We demonstrate
Authors
(none)
Tags
Stats
Related papers
- Contrastive Audio-language Learning For Music (2022)0.00
- Exploring Modality-agnostic Representations For Music Classification (2021)0.00
- Self-supervised Contrastive Learning For Robust Audio-sheet Music Retrieval Systems (2023)5.24
- Audio-visual Embedding For Cross-modal Musicvideo Retrieval Through Supervised Deep CCA (2019)11.93
- Cross-modal Music Retrieval And Applications: An Overview Of Key Methodologies (2019)12.68
- Towards Robust And Truly Large-scale Audio-sheet Music Retrieval (2023)4.52
- Video And Audio Are Images: A Cross-modal Mixer For Original Data On Video-audio Retrieval (2023)7.16
- Adversarial Cross-modal Retrieval Via Learning And Transferring Single-modal Similarities (2019)8.60