← all datasets

MLVU

Emerging

14papers using it

2025first seen

MLVU is a benchmark dataset used to evaluate the performance of multimodal large language models (MLLMs) in video understanding tasks.

🔎 Find this dataset

Papers using MLVU (14)

VisReflect: Latent Visual Reflection for Fine-Grained Perception in Long Visual Context2026

QEVA: A Reference-Free Evaluation Metric for Narrative Video Summarization with Multimodal Question Answering2026

Event-Anchored Frame Selection for Effective Long-Video Understanding2026

Question-guided Visual Compression with Memory Feedback for Long-Term Video Understanding2026

Adaptive Greedy Frame Selection for Long Video Understanding2026

ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling2026

ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding2026

MSJoE: Jointly Evolving MLLM and Sampler for Efficient Long-Form Video Understanding2026

Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding2026

LiViBench: An Omnimodal Benchmark for Interactive Livestream Video Understanding2026

Video-o3: Native Interleaved Clue Seeking for Long Video Multi-Hop Reasoning2026

Towards Effective Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval2025

Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs2025

FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs2025

MLVU dataset — papers, benchmarks & downloads · Multimodal