Two-stage Triplet Loss Training With Curriculum Augmentation For Audio-visual Retrieval
2023 Β· Donghuo Zeng, Kazushi Ikeda
Abstract
The cross-modal retrieval model leverages the potential of triple loss optimization to learn robust embedding spaces. However, existing methods often train these models in a singular pass, overlooking the distinction between semi-hard and hard triples in the optimization process. The oversight of not distinguishing between semi-hard and hard triples leads to suboptimal model performance. In this paper, we introduce a novel approach rooted in curriculum learning to address this problem. We propose a two-stage training paradigm that guides the model's learning process from semi-hard to hard triplets. In the first stage, the model is trained with a set of semi-hard triplets, starting from a low-loss base. Subsequently, in the second stage, we augment the embeddings using an interpolation technique. This process identifies potential hard negatives, alleviating issues arising from high-loss functions due to a scarcity of hard triples. Our approach then applies hard triplet mining in the aug
Authors
(none)
Tags
Stats
Related papers
- Unified Loss Of Pair Similarity Optimization For Vision-language Retrieval (2022)0.00
- Dual-modal Attention-enhanced Text-video Retrieval With Triplet Partial Margin Contrastive Learning (2023)8.82
- Deep Triplet Neural Networks With Cluster-cca For Audio-visual Cross-modal Retrieval (2019)12.61
- Fast Training Of Triplet-based Deep Binary Embedding Networks (2016)14.62
- A Quadruplet Loss For Enforcing Semantically Coherent Embeddings In Multi-output Classification Problems (2020)6.77
- Triplet-center Loss For Multi-view 3D Object Retrieval (2018)18.70
- Improved Embeddings With Easy Positive Triplet Mining (2019)15.06
- Estimated Audio-caption Correspondences Improve Language-based Audio Retrieval (2024)0.00