Hongsheng Li
31 papers · 2 citations
Most-cited papers
- Uniformer: Unifying Convolution And Self-attention For Visual Recognition2022 · 541 citations
- Learning Feature Pyramids For Human Pose Estimation2017 · 535 citations
- SPHINX: The Joint Mixing Of Weights, Tasks, And Visual Embeddings For Multi-modal Large Language Models2023 · 288 citations
- SPHINX-X: Scaling Data And Parameters For A Family Of Multi-modal Large Language Models2024 · 149 citations
- Mathcoder-vl: Bridging Vision And Code For Enhanced Multimodal Mathematical Reasoning2025 · 2 citations
- Mint-cot: Enabling Interleaved Visual Tokens In Mathematical Chain-of-thought Reasoning2025
- Got-r1: Unleashing Reasoning Capability Of MLLM For Visual Generation With Reinforcement Learning2025
- Is Your VLM Sky-ready? A Comprehensive Spatial Intelligence Benchmark For UAV Navigation2025
Topics