Awesome Speech Audio
πŸ“„Papers🧭TopicsπŸ”₯TrendingπŸ—ΊοΈMapπŸ†LeaderboardsπŸ€–Ask AI
β‹―More
πŸ‘₯AuthorsπŸ“šReading PacksπŸ› οΈToolsπŸ“Blogsβœ‰οΈNewsletterπŸ”–Saved
+ Add Paper

← authors Β· overview

Yixuan Li

10 papers Β· 7 citations
Most-cited papers
  • Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning For Vision Language Models
    2024 Β· 138 citations
  • Picle: Eliciting Diverse Behaviors From Large Language Models With Persona In-context Learning
    2024 Β· 31 citations
  • Autodroid-v2: Boosting Slm-based GUI Agents Via Code Generation
    2024 Β· 26 citations
  • Intercontrol: Zero-shot Human Interaction Generation By Controlling Every Joint
    2023 Β· 21 citations
  • Vquala 2025 Challenge On Visual Quality Comparison For Large Multimodal Models: Methods And Results
    2025 Β· 7 citations
  • Holocine: Holistic Generation Of Cinematic Multi-shot Long Video Narratives
    2025
  • Qdepth-vla: Quantized Depth Prediction As Auxiliary Supervision For Vision-language-action Models
    2025
  • LSVOS 2025 Challenge Report: Recent Advances In Complex Video Object Segmentation
    2025
  • Magicquillv2: Precise And Interactive Image Editing With Layered Visual Cues
    2025
Topics
In-Context LearningCodeModel ArchitectureVisual QA & ReasoningBenchmarksVision-Language ModelsVideo-LanguageUncategorizedVision-LanguageEvaluation

Stay Updated

E-Mail Digest

Submit a paper Β· Privacy Β· Terms

Β© 2026 Awesome Papers.