Safe-clip: Removing NSFW Concepts From Vision-and-language Models
2023 Β· Samuele Poppi, Tobia Poppi, Federico Cocchi, et al.
Abstract
Large-scale vision-and-language models, such as CLIP, are typically trained on web-scale data, which can introduce inappropriate content and lead to the development of unsafe and biased behavior. This, in turn, hampers their applicability in sensitive and trustworthy contexts and could raise significant concerns in their adoption. Our research introduces a novel approach to enhancing the safety of vision-and-language models by diminishing their sensitivity to NSFW (not safe for work) inputs. In particular, our methodology seeks to sever "toxic" linguistic and visual concepts, unlearning the linkage between unsafe linguistic or visual items and unsafe regions of the embedding space. We show how this can be done by fine-tuning a CLIP model on synthetic data obtained from a large language model trained to convert between safe and unsafe sentences, and a text-to-image generator. We conduct extensive experiments on the resulting embedding space for cross-modal retrieval, text-to-image, and
Authors
(none)
Tags
Stats
Related papers
- Learning The Visualness Of Text Using Large Vision-language Models (2023)4.52
- They're All Doctors: Synthesizing Diverse Counterfactuals To Mitigate Associative Bias (2024)0.00
- Calibclip: Contextual Calibration Of Dominant Semantics For Text-driven Image Retrieval (2025)0.00
- Fairclip: Social Bias Elimination Based On Attribute Prototype Learning And Representation Neutralization (2022)0.00
- Koo-fu CLIP: Closed-form Adaptation Of Vision-language Models Via Fukunaga-koontz Linear Discriminant Analysis (2026)0.00
- Seeing What Matters: Empowering CLIP With Patch Generation-to-selection (2025)5.24
- Adversarially Robust CLIP Models Can Induce Better (robust) Perceptual Metrics (2025)3.58
- CLIPS: An Enhanced CLIP Framework For Learning With Synthetic Captions (2024)0.00