Koo-fu CLIP: Closed-form Adaptation Of Vision-language Models Via Fukunaga-koontz Linear Discriminant Analysis
2026 Β· Matej Suchanek, Klara Janouskova, Ondrej Vasatko, et al.
Abstract
Visual-language models such as CLIP provide powerful general-purpose representations, but their raw embeddings are not optimized for supervised classification, often exhibiting limited class separation and excessive dimensionality. We propose Koo-Fu CLIP, a supervised CLIP adaptation method based on Fukunaga-Koontz Linear Discriminant Analysis, which operates in a whitened embedding space to suppress within-class variation and enhance between-class discrimination. The resulting closed-form linear projection reshapes the geometry of CLIP embeddings, improving class separability while performing effective dimensionality reduction, and provides a lightweight and efficient adaptation of CLIP representations. Across large-scale ImageNet benchmarks, nearest visual prototype classification in the Koo-Fu CLIP space improves top-1 accuracy from 75.1% to 79.1% on ImageNet-1K, with consistent gains persisting as the label space expands to 14K and 21K classes. The method supports substantial com
Authors
(none)
Tags
Stats
Related papers
- Clip-lite: Information Efficient Visual Representation Learning With Language Supervision (2021)2.35
- Liteembed: Adapting CLIP To Rare Classes (2026)0.00
- Finetuning CLIP To Reason About Pairwise Differences (2024)0.00
- Distill CLIP (DCLIP): Enhancing Image-text Retrieval Via Cross-modal Transformer Distillation (2025)0.00
- Superclip: CLIP With Simple Classification Supervision (2025)0.00
- FG-CLIP: Fine-grained Visual And Textual Alignment (2025)5.75
- CLIP-KD: An Empirical Study Of CLIP Model Distillation (2023)17.57
- LLM2CLIP: Powerful Language Model Unlocks Richer Cross-modality Representation (2024)2.26