Tng-clip:training-time Negation Data Generation For Negation Awareness Of CLIP
2025 Β· Yuliang Cai, Jesse Thomason, Mohammad Rostami
Abstract
Vision-language models (VLMs), such as CLIP, have demonstrated strong performance across a range of downstream tasks. However, CLIP is still limited in negation understanding: the ability to recognize the absence or exclusion of a concept. Existing methods address the problem by using a large language model (LLM) to generate large-scale data of image captions containing negation for further fine-tuning CLIP. However, these methods are both time- and compute-intensive, and their evaluations are typically restricted to image-text matching tasks. To expand the horizon, we (1) introduce a training-time negation data generation pipeline such that negation captions are generated during the training stage, which only increases 2.5% extra training time, and (2) we propose the first benchmark, Neg-TtoI, for evaluating text-to-image generation models on prompts containing negation, assessing model's ability to produce semantically accurate images. We show that our proposed method, TNG-CLIP, achi
Authors
(none)
Tags
Stats
Related papers
- Contrastive Vision-language Learning With Paraphrasing And Negation (2025)0.00
- Spacevlm: Sub-space Modeling Of Negation In Vision-language Models (2025)0.00
- The Effect Of Negation On CLIP In Medical Imaging: Limitations Of Contrastive Language-image Pretraining (2025)0.00
- Tripletclip: Improving Compositional Reasoning Of CLIP Via Synthetic Vision-language Negatives (2024)4.52
- Contrasting Intra-modal And Ranking Cross-modal Hard Negatives To Enhance Visio-linguistic Compositional Understanding (2023)12.11
- Towards Effective Negation Modeling In Joint Audio-text Models For Music (2026)0.00
- FG-CLIP: Fine-grained Visual And Textual Alignment (2025)5.75
- No Captions, No Problem: Captionless 3D-CLIP Alignment With Hard Negatives Via CLIP Knowledge And Llms (2024)0.00