ImageNet-10
Emerging15papers using it
2025first seen
'ImageNet-10' is a subset of the ImageNet dataset that contains a limited number of classes and is used to evaluate the performance of multimodal systems in vision-language understanding.
Papers using ImageNet-10 (15)
- Mitigating Visual Hallucinations in Multimodal Systems through Retrieval-Augmented Reliability-Aware InferenceImmuno-VLM: Immunizing Large Vision-Language Models via Generative Semantic Antibodies for Open-World TrustworthinessJEPA-T: Joint-embedding Predictive Architecture With Text Fusion For Image GenerationMultimodal Large Language Models as Image ClassifiersEvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and GenerationHigh-Fidelity Text-to-Image Generation from Pre-Trained Vision-Language Models via Distribution-Conditioned Diffusion DecodingExplaining CLIP Zero-shot Predictions Through ConceptsKoo-Fu CLIP: Closed-Form Adaptation of Vision-Language Models via Fukunaga-Koontz Linear Discriminant AnalysisA Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion TransformersCross-modal Proxy Evolving for OOD Detection with Vision-Language ModelsVision Also You Need: Navigating Out-of-Distribution Detection with Multimodal Large Language ModelExplaining Similarity in Vision-Language Encoders with Weighted Banzhaf InteractionsViLU: Learning Vision-Language Uncertainties for Failure PredictionBeginning With You: Perceptual-initialization Improves Vision-language Representation And AlignmentImage Recognition With Vision And Language Embeddings Of Vlms