Xiaoyi Dong
14 papers · 3173 citations
Most-cited papers
- How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites2024 · 1136 citations
- Cswin Transformer: A General Vision Transformer Backbone With Cross-shaped Windows2021 · 1069 citations
- Are We On The Right Way For Evaluating Large Vision-language Models?2024 · 736 citations
- Mobile-former: Bridging Mobilenet And Transformer2021 · 583 citations
- Internlm2 Technical Report2024 · 378 citations
- Internlm-xcomposer2: Mastering Free-form Text-image Composition And Comprehension In Vision-language Large Model2024 · 372 citations
- How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites2024 · 339 citations
- Sharegpt4v: Improving Large Multi-modal Models With Better Captions2023 · 237 citations
- Internlm-xcomposer-2.5: A Versatile Large Vision Language Model Supporting Long-contextual Input And Output2024 · 192 citations
- Maskclip: Masked Self-distillation Advances Contrastive Language-image Pretraining2022 · 146 citations
Topics