Jifeng Dai
14 papers · 15 citations
Most-cited papers
- Internvl: Scaling Up Vision Foundation Models And Aligning For Generic Visual-linguistic Tasks2023 · 2715 citations
- How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites2024 · 1136 citations
- How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites2024 · 339 citations
- Fast Convergence Of DETR With Spatially Modulated Co-attention2021 · 309 citations
- Tip-adapter: Training-free Adaption Of CLIP For Few-shot Classification2022 · 306 citations
- Internlm-xcomposer-2.5: A Versatile Large Vision Language Model Supporting Long-contextual Input And Output2024 · 192 citations
- Frozen CLIP Models Are Efficient Video Learners2022 · 156 citations
- Visionllm V2: An End-to-end Generalist Multimodal Large Language Model For Hundreds Of Vision-language Tasks2024 · 149 citations
- Fuseformer: Fusing Fine-grained Information In Transformers For Video Inpainting2021 · 143 citations
- Mono-internvl: Pushing The Boundaries Of Monolithic Multimodal Large Language Models With Endogenous Visual Pre-training2024 · 79 citations
- Spatial Frequency Modulation For Semantic Segmentation2025 · 15 citations
- Ghost In The Minecraft: Generally Capable Agents For Open-world Environments Via Large Language Models With Text-based Knowledge And Memory2023
- Zerogui: Automating Online GUI Learning At Zero Human Cost2025
- Mono-internvl-1.5: Towards Cheaper And Faster Monolithic Multimodal Large Language Models2025
- Mmbench-gui: Hierarchical Multi-platform Evaluation Framework For GUI Agents2025
Topics