Qwen-2.5-VL
Emerging7papers using it
2025first seen
The 'Qwen2.5-VL' dataset/benchmark is used to evaluate the performance of multimodal large language models (MLLMs) in visual tasks by assessing their ability to process and interpret visual information effectively.
Papers using Qwen-2.5-VL (7)
- HAWK: Head Importance-Aware Visual Token Pruning in Multimodal ModelsThink-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language ModelsHALP: Detecting Hallucinations in Vision-Language Models without Generating a Single TokenCost-Efficient Multimodal LLM Inference via Cross-Tier GPU HeterogeneityMedvlthinker: Simple Baselines For Multimodal Medical ReasoningMASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language ModelsNexus: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision