← authors · overview

Yixuan Li

10 papers · 7 citations

Most-cited papers

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning For Vision Language Models
2024 · 138 citations
Picle: Eliciting Diverse Behaviors From Large Language Models With Persona In-context Learning
2024 · 31 citations
Autodroid-v2: Boosting Slm-based GUI Agents Via Code Generation
2024 · 26 citations
Intercontrol: Zero-shot Human Interaction Generation By Controlling Every Joint
2023 · 21 citations
Vquala 2025 Challenge On Visual Quality Comparison For Large Multimodal Models: Methods And Results
2025 · 7 citations
Holocine: Holistic Generation Of Cinematic Multi-shot Long Video Narratives
2025
Qdepth-vla: Quantized Depth Prediction As Auxiliary Supervision For Vision-language-action Models
2025
LSVOS 2025 Challenge Report: Recent Advances In Complex Video Object Segmentation
2025
Magicquillv2: Precise And Interactive Image Editing With Layered Visual Cues
2025

Topics

In-Context Learning Code Model Architecture Visual QA & Reasoning Benchmarks Vision-Language Models Video-Language Uncategorized Vision-Language Evaluation