On Negative Sampling For Contrastive Audio-text Retrieval
2022 · Huang Xie, Okko Räsänen, Tuomas Virtanen
Abstract
This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. The strategy for negative sampling refers to selecting negatives (either audio clips or textual descriptions) from a pool of candidates for a positive audio-text pair. We explore sampling strategies via model-estimated within-modality and cross-modality relevance scores for audio and text samples. With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling. Experimental results show that retrieval performance varies dramatically among different strategies. Particularly, by selecting semi-hard negatives with cross-modality scores, the retrieval system gains improved performance in both text-to-audio and audio-to-text retrieval. Besides, we show that feature collapse occurs while sampling hard negatives with cross-modality scores.
Authors
(none)
Tags
Stats
Related papers
- Positive And Negative Sampling Strategies For Self-supervised Learning On Audio-video Data (2024)0.00
- Contrastive Latent Space Reconstruction Learning For Audio-text Retrieval (2023)3.58
- CLN-VC: Text-free Voice Conversion Based On Fine-grained Style Control And Contrastive Learning With Negative Samples Augmentation (2023)2.26
- All Information Is Necessary: Integrating Speech Positive And Negative Information By Contrastive Learning For Speech Enhancement (2023)0.00
- Introducing Auxiliary Text Query-modifier To Content-based Audio Retrieval (2022)0.00
- Contextual Speech Recognition With Difficult Negative Training Examples (2018)10.21
- Augment, Drop & Swap: Improving Diversity In LLM Captions For Efficient Music-text Representation Learning (2024)0.00
- Supclap: Controlling Optimization Trajectory Drift In Audio-text Contrastive Learning With Support Vector Regularization (2025)0.00