Bowen Zhang
18 papers · 1269 citations
Most-cited papers
- Ferret: Refer And Ground Anything Anywhere At Any Granularity2023 · 503 citations
- MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training2024 · 261 citations
- Structured 3D Latents For Scalable And Versatile 3D Generation2024 · 125 citations
- Extract, Define, Canonicalize: An Llm-based Framework For Knowledge Graph Construction2024 · 106 citations
- Ferret-v2: An Improved Baseline For Referring And Grounding With Large Language Models2024 · 101 citations
- MM1.5: Methods, Analysis & Insights From Multimodal LLM Fine-tuning2024 · 70 citations
- Mv-adapter: Multimodal Video Transfer Learning For Video Text Retrieval2023 · 19 citations
- STAIR: Learning Sparse Text And Image Representation In Grounded Tokens2023 · 14 citations
- STAIR: Learning Sparse Text And Image Representation In Grounded Tokens2023 · 14 citations
- Learning To Represent Image And Text With Denotation Graph2020 · 10 citations
- MOFI: Learning Image Representations From Noisy Entity Annotated Images2023
- Hunyuan3d-omni: A Unified Framework For Controllable Generation Of 3D Assets2025
- Revisit Large-scale Image-caption Data In Pre-training Multimodal Foundation Models2024
- SAM 3D: 3dfy Anything In Images2025
Topics