Awesome Speech Audio
📄Papers🧭Topics👥Authors🔥Trending🗺️Map🏆Leaderboards📚Packs🛠️Tools📝Blogs🤖Ask AI✉️Newsletter🚀Pro
+ Add Paper

← authors · overview

Yinfei Yang

10 papers · 0 citations
Most-cited papers
  • Ferret: Refer And Ground Anything Anywhere At Any Granularity
    2023 · 503 citations
  • MM1: Methods, Analysis & Insights From Multimodal LLM Pre-training
    2024 · 261 citations
  • Ferret-ui: Grounded Mobile UI Understanding With Multimodal Llms
    2024 · 174 citations
  • Guiding Instruction-based Image Editing Via Multimodal Large Language Models
    2023 · 172 citations
  • Ferret-v2: An Improved Baseline For Referring And Grounding With Large Language Models
    2024 · 101 citations
  • Large Dual Encoders Are Generalizable Retrievers
    2021 · 90 citations
  • Ferret-ui: Grounded Mobile UI Understanding With Multimodal Llms
    2024 · 33 citations
  • STAIR: Learning Sparse Text And Image Representation In Grounded Tokens
    2023 · 14 citations
  • STAIR: Learning Sparse Text And Image Representation In Grounded Tokens
    2023 · 14 citations
  • Masked Autoencoding Does Not Help Natural Language Supervision At Scale
    2023 · 5 citations
  • MOFI: Learning Image Representations From Noisy Entity Annotated Images
    2023
  • Pico-banana-400k: A Large-scale Dataset For Text-guided Image Editing
    2025
  • So-bench: A Structural Output Evaluation Of Multimodal Llms
    2025
  • Car-flow: Condition-aware Reparameterization Aligns Source And Target For Better Flow Matching
    2025
  • Univg: A Generalist Diffusion Model For Unified Image Generation And Editing
    2025
Topics
Vision-LanguageModel ArchitectureVisual LanguageVision-Language ModelsImage RetrievalCodeTraining TechniquesFine-TuningPromptingUncategorized

Stay Updated

E-Mail Digest

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.