MMBench
Canonical32papers using it
2023first seen
MMBench is a multilingual benchmark derived from widely used English datasets, designed to evaluate the performance of vision-language models across five European languages.
Papers using MMBench (32)
- STEP3-VL-10B Technical ReportMotionVLA: Vision-Language-Action Model for Humanoid MotionId-align: Rope-conscious Position Remapping For Dynamic High-resolution Adaptation In Vision-language ModelsMultilingual Training and Evaluation Resources for Vision-Language ModelsVLM-RobustBench: A Comprehensive Benchmark for Robustness of Vision-Language ModelsCheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and GenerationPerceptio: Perception Enhanced Vision Language Models via Spatial Token GenerationACPO: Counteracting Likelihood Displacement in Vision-Language Alignment with Asymmetric ConstraintsAnnotation-Efficient Vision-Language Model Adaptation to the Polish Language Using the LLaVA FrameworkHow to Take a Memorable Picture? Empowering Users with Actionable FeedbackText-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMsScaling Vision Language Models for Pharmaceutical Long Form Video Reasoning on Industrial GenAI PlatformGated Relational Alignment via Confidence-based Distillation for Efficient VLMsStitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial UnderstandingHybridToken-VLM: Hybrid Token Compression for Vision-Language ModelsAttAnchor: Guiding Cross-Modal Token Alignment in VLMs with Attention AnchorsThe Telephone Game: Evaluating Semantic Drift in Unified ModelsSelf-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-ReflectionLanguage-specific Layer Matters: Efficient Multilingual Enhancement For Large Vision-language ModelsGam-agent: Game-theoretic And Uncertainty-aware Collaboration For Complex Visual ReasoningPostAlign: Multimodal Grounding as a Corrective Lens for MLLMsFrom Evaluation to Defense: Advancing Safety in Video Large Language ModelsMMTBENCH: A Unified Benchmark for Complex Multimodal Table ReasoningMOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused
Vision-Language ProcessingMMBench: Is Your Multi-modal Model an All-around Player?InternLM-XComposer: A Vision-Language Large Model for Advanced
Text-image Comprehension and CompositionMMICL: Empowering Vision-language Model with Multi-Modal In-Context
LearningEE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large
Language ModelRetrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits
Multimodal ReasoningDynamic Multimodal Evaluation with Flexible Complexity by Vision-Language BootstrappingEnhancing Instruction-Following Capability of Visual-Language Models by
Reducing Image RedundancyEnhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding