← all datasets

MVPBench

Emerging

10papers using it

2025first seen

MVPBench is a curated benchmark designed to evaluate visual physical reasoning in multimodal large language models through interleaved multi-image inputs that require coherent, step-by-step reasoning paths.

🔎 Find this dataset

Papers using MVPBench (10)

Pushupbench: Your VLM Is Not Good At Counting Pushups2026

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization2026

Clue Matters: Leveraging Latent Visual Clues to Empower Video Reasoning2026

MACD: Model-Aware Contrastive Decoding via Counterfactual Data2026

Improving Video Question Answering through query-based frame selection2026

Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding2026

Seeing Is Not Reasoning: Mvpbench For Graph-based Evaluation Of Multi-path Visual Physical Cot2025

Enhancing Temporal Understanding In Video-llms Through Stacked Temporal Attention In Vision Encoders2025

Ro-bench: Large-scale Robustness Evaluation Of Mllms With Text-driven Counterfactual Videos2025

Gam-agent: Game-theoretic And Uncertainty-aware Collaboration For Complex Visual Reasoning2025

MVPBench — datasets — multimodal