Supclap: Controlling Optimization Trajectory Drift In Audio-text Contrastive Learning With Support Vector Regularization
2025 Β· Jiehui Luo, Yuguo Yin, Yuxin Xie, et al.
Abstract
Contrastive language-audio pretraining, which aims to unify multimodal representations in a shared embedding space, serves as a cornerstone for building a wide range of applications, from cross-modal retrieval to cutting-edge multimodal large language models. However, we find that the perpendicular component of the pushing force from negative samples in contrastive learning is a double-edged sword: it contains rich supplementary information from negative samples, yet its unconstrained nature causes optimization trajectory drift and training instability. To address this, we propose Support Vector Regularization (SVR), a method that introduces an auxiliary support vector to control this perpendicular component, aiming to harness its rich information while mitigating the associated trajectory drift. The efficacy of SVR is critically governed by its semantic radius, for which we explore two unsupervised modeling strategies: direct parameterization and an adaptive radius predictor module en
Authors
(none)
Tags
Stats
Related papers
- Clapspeech: Learning Prosody From Text Context With Contrastive Language-audio Pre-training (2023)0.00
- Discriminative Speaker Representation Via Contrastive Learning With Class-aware Attention In Angular Space (2022)8.60
- Robust Data2vec: Noise-robust Speech Representation Learning For ASR By Combining Regression And Improved Contrastive Learning (2022)9.76
- Self-supervised Text-independent Speaker Verification Using Prototypical Momentum Contrastive Learning (2020)12.93
- Asymmetric Clean Segments-guided Self-supervised Learning For Robust Speaker Verification (2023)5.84
- Contrastive Learning For Improving ASR Robustness In Spoken Language Understanding (2022)6.34
- ML-LMCL: Mutual Learning And Large-margin Contrastive Learning For Improving ASR Robustness In Spoken Language Understanding (2023)0.00
- CLASP: Contrastive Language-speech Pretraining For Multilingual Multimodal Information Retrieval (2024)0.00