Wenhai Wang
17 papers · 0 citations
Most-cited papers
- Pyramid Vision Transformer: A Versatile Backbone For Dense Prediction Without Convolutions2021 · 4239 citations
- Internvl: Scaling Up Vision Foundation Models And Aligning For Generic Visual-linguistic Tasks2023 · 2715 citations
- How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites2024 · 1136 citations
- How Far Are We To GPT-4V? Closing The Gap To Commercial Multimodal Models With Open-source Suites2024 · 339 citations
- Internlm-xcomposer-2.5: A Versatile Large Vision Language Model Supporting Long-contextual Input And Output2024 · 192 citations
- Segmenting Transparent Objects In The Wild2020 · 153 citations
- Visionllm V2: An End-to-end Generalist Multimodal Large Language Model For Hundreds Of Vision-language Tasks2024 · 149 citations
- PAN++: Towards Efficient And Accurate End-to-end Spotting Of Arbitrarily-shaped Text2021 · 100 citations
- Controlllm: Augment Language Models With Tools By Searching On Graphs2023 · 65 citations
- VL-LTR: Learning Class-wise Visual-linguistic Representation For Long-tailed Visual Recognition2021 · 42 citations
- Scalecua: Scaling Open-source Computer Use Agents With Cross-platform Data2025
- Zerogui: Automating Online GUI Learning At Zero Human Cost2025
- Mono-internvl-1.5: Towards Cheaper And Faster Monolithic Multimodal Large Language Models2025
- Mmbench-gui: Hierarchical Multi-platform Evaluation Framework For GUI Agents2025
- Genexam: A Multidisciplinary Text-to-image Exam2025
Topics