Zhengyuan Yang
11 papers · 2 citations
Most-cited papers
- Scaling Up Vision-language Pre-training For Image Captioning2021 · 157 citations
- TAP: Text-aware Pre-training For Text-vqa And Text-caption2020 · 96 citations
- SAT: 2D Semantics Assisted Training For 3D Visual Grounding2021 · 90 citations
- Diagnostic Benchmark And Iterative Inpainting For Layout-guided Image Generation2023 · 6 citations
- GLIMPSE: Do Large Vision-language Models Truly Think With Videos Or Just Glimpse At Them?2025 · 2 citations
- Exploring A Unified Vision-centric Contrastive Alternatives On Multi-modal Web Documents2025
- Edival-agent: An Object-centric Framework For Automated, Fine-grained Evaluation Of Multi-turn Editing2025
- Edival-agent: An Object-centric Framework For Automated, Fine-grained Evaluation Of Multi-turn Editing2025
- Glance: Accelerating Diffusion Models With 1 Sample2025
- Point-rft: Improving Multimodal Reasoning With Visually Grounded Reinforcement Finetuning2025
Topics