On The Effect Of Data-augmentation On Local Embedding Properties In The Contrastive Learning Of Music Audio Representations
2024 Β· Matthew C. McCallum, Matthew E. P. Davies, Florian Henkel, et al.
Abstract
Audio embeddings are crucial tools in understanding large catalogs of music. Typically embeddings are evaluated on the basis of the performance they provide in a wide range of downstream tasks, however few studies have investigated the local properties of the embedding spaces themselves which are important in nearest neighbor algorithms, commonly used in music search and recommendation. In this work we show that when learning audio representations on music datasets via contrastive learning, musical properties that are typically homogeneous within a track (e.g., key and tempo) are reflected in the locality of neighborhoods in the resulting embedding space. By applying appropriate data augmentation strategies, localisation of such properties can not only be reduced but the localisation of other attributes is increased. For example, locality of features such as pitch and tempo that are less relevant to non-expert listeners, may be mitigated while improving the locality of more salient fea
Authors
(none)
Tags
Stats
Related papers
- Contrastive Learning For Cross-modal Artist Retrieval (2023)0.00
- Self-supervised Contrastive Learning For Robust Audio-sheet Music Retrieval Systems (2023)5.24
- Improving Natural-language-based Audio Retrieval With Transfer Learning And Audio & Text Augmentations (2022)0.00
- Contrastive Audio-language Learning For Music (2022)0.00
- Audio-visual Embedding For Cross-modal Musicvideo Retrieval Through Supervised Deep CCA (2019)11.93
- Similar But Faster: Manipulation Of Tempo In Music Audio Embeddings For Tempo Prediction And Search (2024)5.84
- Self-supervised Auxiliary Loss For Metric Learning In Music Similarity-based Retrieval And Auto-tagging (2023)0.00
- Matching Text And Audio Embeddings: Exploring Transfer-learning Strategies For Language-based Audio Retrieval (2022)0.00