← authors · overview

Xiaoyi Dong

14 papers · 3173 citations

Most-cited papers

How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites
2024 · 1136 citations
Cswin Transformer: A General Vision Transformer Backbone With Cross-shaped Windows
2021 · 1069 citations
Are We On The Right Way For Evaluating Large Vision-language Models?
2024 · 736 citations
Mobile-former: Bridging Mobilenet And Transformer
2021 · 583 citations
Internlm2 Technical Report
2024 · 378 citations
Internlm-xcomposer2: Mastering Free-form Text-image Composition And Comprehension In Vision-language Large Model
2024 · 372 citations
How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites
2024 · 339 citations
Sharegpt4v: Improving Large Multi-modal Models With Better Captions
2023 · 237 citations
Internlm-xcomposer-2.5: A Versatile Large Vision Language Model Supporting Long-contextual Input And Output
2024 · 192 citations
Maskclip: Masked Self-distillation Advances Contrastive Language-image Pretraining
2022 · 146 citations

Topics

Vision-Language Model Architecture Training Techniques Fine-Tuning Visual Language In-Context Learning Evaluation Code Prompting Efficiency