Renrui Zhang
12 papers · 0 citations
Most-cited papers
- Video-mme: The First-ever Comprehensive Evaluation Benchmark Of Multi-modal Llms In Video Analysis2024 · 1125 citations
- Tip-adapter: Training-free Adaption Of CLIP For Few-shot Classification2022 · 306 citations
- SPHINX: The Joint Mixing Of Weights, Tasks, And Visual Embeddings For Multi-modal Large Language Models2023 · 288 citations
- Point-bind & Point-llm: Aligning Point Cloud With Multi-modality For 3D Understanding, Generation, And Instruction Following2023 · 213 citations
- Manipllm: Embodied Multimodal Large Language Model For Object-centric Robotic Manipulation2023 · 209 citations
- Imagebind-llm: Multi-modality Instruction Tuning2023 · 174 citations
- Pointclip V2: Prompting CLIP And GPT For Powerful 3D Open-world Learning2022 · 158 citations
- Frozen CLIP Models Are Efficient Video Learners2022 · 156 citations
- CALIP: Zero-shot Enhancement Of CLIP With Parameter-free Attention2022 · 91 citations
- Can Language Understand Depth?2022 · 57 citations
- Delving Into RL For Image Generation With Cot: A Study On DPO Vs. GRPO2025
- Mint-cot: Enabling Interleaved Visual Tokens In Mathematical Chain-of-thought Reasoning2025
- Ac-dit: Adaptive Coordination Diffusion Transformer For Mobile Manipulation2025
- Unictokens: Boosting Personalized Understanding And Generation Via Unified Concept Tokens2025
Topics