← all datasets

MMBench

Canonical

32papers using it

2023first seen

Dataset Card for "MMBench" More Information needed

🔎 Find this dataset

Papers using MMBench (32)

STEP3-VL-10B Technical Report2026

MotionVLA: Vision-Language-Action Model for Humanoid Motion2026

Id-align: Rope-conscious Position Remapping For Dynamic High-resolution Adaptation In Vision-language Models2025

FTibSuite: A Comprehensive Resource Suite for Tibetan Vision-Language Modeling2026

Multilingual Training and Evaluation Resources for Vision-Language Models2026

VLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language Models2026

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation2026

Perceptio: Perception Enhanced Vision Language Models via Spatial Token Generation2026

ACPO: Counteracting Likelihood Displacement in Vision-Language Alignment with Asymmetric Constraints2026

Annotation-Efficient Vision-Language Model Adaptation to the Polish Language Using the LLaVA Framework2026

How to Take a Memorable Picture? Empowering Users with Actionable Feedback2026

Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs2026

Scaling Vision Language Models for Pharmaceutical Long Form Video Reasoning on Industrial GenAI Platform2026

Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs2026

Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial Understanding2025

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models2025

AttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention Anchors2025

The Telephone Game: Evaluating Semantic Drift in Unified Models2025

Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection2025

Language-specific Layer Matters: Efficient Multilingual Enhancement For Large Vision-language Models2025

PostAlign: Multimodal Grounding as a Corrective Lens for MLLMs2025

From Evaluation to Defense: Advancing Safety in Video Large Language Models2025

Gam-agent: Game-theoretic And Uncertainty-aware Collaboration For Complex Visual Reasoning2025

MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing2025

MMBench: Is Your Multi-modal Model an All-around Player?2023 · 33 cites

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition2023 · 31 cites

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning2023 · 18 cites

EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model2024 · 1 cites

Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning2024

Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping2024

Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy2024

Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding2024

MMBench dataset — papers, benchmarks & downloads · Multimodal