Yinfei Yang
10 papers · 0 citations
Most-cited papers
- Ferret: Refer And Ground Anything Anywhere At Any Granularity2023 · 503 citations
- MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training2024 · 261 citations
- Ferret-ui: Grounded Mobile UI Understanding With Multimodal Llms2024 · 174 citations
- Guiding Instruction-based Image Editing Via Multimodal Large Language Models2023 · 172 citations
- Ferret-v2: An Improved Baseline For Referring And Grounding With Large Language Models2024 · 101 citations
- Large Dual Encoders Are Generalizable Retrievers2021 · 90 citations
- Ferret-ui: Grounded Mobile UI Understanding With Multimodal Llms2024 · 33 citations
- STAIR: Learning Sparse Text And Image Representation In Grounded Tokens2023 · 14 citations
- STAIR: Learning Sparse Text And Image Representation In Grounded Tokens2023 · 14 citations
- Masked Autoencoding Does Not Help Natural Language Supervision At Scale2023 · 5 citations
- MOFI: Learning Image Representations From Noisy Entity Annotated Images2023
- Pico-banana-400k: A Large-scale Dataset For Text-guided Image Editing2025
- So-bench: A Structural Output Evaluation Of Multimodal Llms2025
- Car-flow: Condition-aware Reparameterization Aligns Source And Target For Better Flow Matching2025
- Univg: A Generalist Diffusion Model For Unified Image Generation And Editing2025
Topics