Awesome Papers
LLMsQuantumSimSearchAI4CodeAgentsCVRoboticsCyberAI4SciSpeechRLMMGenAIGraphTSRecSysFL

← authors · overview

Hongsheng Li

31 papers · 2 citations
Most-cited papers
  • Uniformer: Unifying Convolution And Self-attention For Visual Recognition
    2022 · 541 citations
  • Learning Feature Pyramids For Human Pose Estimation
    2017 · 535 citations
  • SPHINX: The Joint Mixing Of Weights, Tasks, And Visual Embeddings For Multi-modal Large Language Models
    2023 · 288 citations
  • SPHINX-X: Scaling Data And Parameters For A Family Of Multi-modal Large Language Models
    2024 · 149 citations
  • Mathcoder-vl: Bridging Vision And Code For Enhanced Multimodal Mathematical Reasoning
    2025 · 2 citations
  • Mint-cot: Enabling Interleaved Visual Tokens In Mathematical Chain-of-thought Reasoning
    2025
  • Got-r1: Unleashing Reasoning Capability Of MLLM For Visual Generation With Reinforcement Learning
    2025
  • Is Your VLM Sky-ready? A Comprehensive Spatial Intelligence Benchmark For UAV Navigation
    2025
Topics
Vision-Language ModelsVisual QA & ReasoningVision-LanguageModel ArchitectureTraining TechniquesBenchmarksFine-TuningCodeEfficiencycs.MM

Privacy · Terms

© 2026 Awesome Papers.