Awesome Speech Audio
📄Papers🧭Topics👥Authors🔥Trending🗺️Map🏆Leaderboards📚Packs🛠️Tools📝Blogs🤖Ask AI✉️Newsletter🚀Pro
+ Add Paper

← authors · overview

Bowen Zhang

18 papers · 1269 citations
Most-cited papers
  • Ferret: Refer And Ground Anything Anywhere At Any Granularity
    2023 · 503 citations
  • MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training
    2024 · 261 citations
  • Structured 3D Latents For Scalable And Versatile 3D Generation
    2024 · 125 citations
  • Extract, Define, Canonicalize: An Llm-based Framework For Knowledge Graph Construction
    2024 · 106 citations
  • Ferret-v2: An Improved Baseline For Referring And Grounding With Large Language Models
    2024 · 101 citations
  • MM1.5: Methods, Analysis & Insights From Multimodal LLM Fine-tuning
    2024 · 70 citations
  • Mv-adapter: Multimodal Video Transfer Learning For Video Text Retrieval
    2023 · 19 citations
  • STAIR: Learning Sparse Text And Image Representation In Grounded Tokens
    2023 · 14 citations
  • STAIR: Learning Sparse Text And Image Representation In Grounded Tokens
    2023 · 14 citations
  • Learning To Represent Image And Text With Denotation Graph
    2020 · 10 citations
  • MOFI: Learning Image Representations From Noisy Entity Annotated Images
    2023
  • Hunyuan3d-omni: A Unified Framework For Controllable Generation Of 3D Assets
    2025
  • Revisit Large-scale Image-caption Data In Pre-training Multimodal Foundation Models
    2024
  • SAM 3D: 3dfy Anything In Images
    2025
Topics
Model ArchitectureVision-LanguageImage GenerationTraining Techniques3D VisionImage RetrievalCodeFine-TuningVisual LanguageRAG

Stay Updated

E-Mail Digest

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.