MM-Vet
Emerging8papers using it
2023first seen
The 'MM-Vet' dataset/benchmark contains multimodal data used to evaluate the performance of Large Vision-Language Models (LVLMs) in mitigating object hallucinations.
Papers using MM-Vet (8)
- Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language ModelsMitigating Object Hallucinations in LVLMs via Attention Imbalance RectificationVisPlay: Self-Evolving Vision-Language Models from ImagesToken-Level Inference-Time Alignment for Vision-Language ModelsFusion to Enhance: Fusion Visual Encoder to Enhance Multimodal Language ModelInverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision MappingASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLMText as Images: Can Multimodal Large Language Models Follow Printed
Instructions in Pixels?