Variance-aware Loss Scheduling For Multimodal Alignment In Low-data Settings
2025 Β· Sneh Pillai
Abstract
Training vision-language models for image-text alignment typically requires large datasets to achieve robust performance. In low-data scenarios, standard contrastive learning can struggle to align modalities effectively due to overfitting and unstable training dynamics. In this paper, we propose a variance-aware loss scheduling approach that dynamically adjusts the weighting of the contrastive loss based on the statistical variability (uncertainty) in the model's alignment predictions. Using a subset of the Flickr8k image-caption dataset to simulate limited data conditions, we demonstrate that our approach improves image-text retrieval accuracy compared to a fixed-weight baseline. We also compare against other adaptive weighting strategies (using output entropy and cosine similarity spread) and find that variance-aware scheduling provides the best overall trade-off. Qualitatively, our method yields more distinct multimodal embeddings as shown by t-SNE visualizations. Moreover, in a str
Authors
(none)
Tags
Stats
Related papers
- Modest-align: Data-efficient Alignment For Vision-language Models (2025)0.00
- Curriculum Learning For Data-efficient Vision-language Alignment (2022)2.26
- Unified Loss Of Pair Similarity Optimization For Vision-language Retrieval (2022)0.00
- Covmatch: Cross-covariance Guided Multimodal Dataset Distillation With Trainable Text Encoder (2025)0.00
- Contrastive Learning Of Visual-semantic Embeddings (2021)0.00
- Vision-language Modelling For Radiological Imaging And Reports In The Low Data Regime (2023)0.00
- Multimodal Representation Alignment For Cross-modal Information Retrieval (2025)0.00
- Efficient Medical Vision-language Alignment Through Adapting Masked Vision Models (2025)5.74