Awesome Similarity Search
📄Papers🧭Topics👥Authors🔥Trending🗺️Map🏆Leaderboards📚Packs🛠️Tools📝Blogs🤖Ask AI✉️Newsletter🚀Pro
+ Add Paper

← authors · overview

Xizhou Zhu

20 papers · 394 citations
Most-cited papers
  • How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites
    2024 · 339 citations
  • VL-LTR: Learning Class-wise Visual-linguistic Representation For Long-tailed Visual Recognition
    2021 · 42 citations
  • Visionllm V2: An End-to-end Generalist Multimodal Large Language Model For Hundreds Of Vision-language Tasks
    2024 · 5 citations
  • Synergen-vl: Towards Synergistic Image Understanding And Generation With Vision Experts And Token Folding
    2024 · 4 citations
  • PVC: Progressive Visual Token Compression For Unified Image And Video Processing In Large Vision-language Models
    2024 · 2 citations
  • Ghost In The Minecraft: Generally Capable Agents For Open-world Environments Via Large Language Models With Text-based Knowledge And Memory
    2023
  • Zerogui: Automating Online GUI Learning At Zero Human Cost
    2025
  • Mmbench-gui: Hierarchical Multi-platform Evaluation Framework For GUI Agents
    2025
  • Mirothinker: Pushing The Performance Boundaries Of Open-source Research Agents Via Model, Context, And Interactive Scaling
    2025
  • Collaborative Visual Navigation
    2021
Topics
Visual Language3D VisionCode AgentsImage GenerationVideo UnderstandingMulti-AgentObject DetectionMemoryUncategorizedEvaluation

Stay Updated

E-Mail Digest

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.