di Zhang
21 papers · 0 citations
Most-cited papers
- Chemllm: A Chemical Large Language Model2024 · 106 citations
- Video-lavit: Unified Video-language Pre-training With Decoupled Visual-motional Tokenization2024 · 94 citations
- Unified Language-vision Pretraining In LLM With Dynamic Discrete Visual Tokenization2023 · 87 citations
- MM-RLHF: The Next Step Forward In Multimodal LLM Alignment2025 · 78 citations
- Shieldlm: Empowering Llms As Aligned, Customizable And Explainable Safety Detectors2024 · 56 citations
- Recammaster: Camera-controlled Generative Rendering From A Single Video2025 · 3 citations
- HAIC: Improving Human Action Understanding And Generation With Better Captions For Multi-modal Large Language Models2025 · 1 citations
- Imbalance In Balance: Online Concept Balancing In Generation Models2025
- MUSE: Multi-subject Unified Synthesis Via Explicit Layout Semantic Expansion2025
- MUSE: Multi-subject Unified Synthesis Via Explicit Layout Semantic Expansion2025
- Taskgalaxy: Scaling Multi-modal Instruction Fine-tuning With Tens Of Thousands Vision Task Types2025
- Patchvsr: Breaking Video Diffusion Resolution Limits With Patch-wise Video Super-resolution2025
- Learning Video Generation For Robotic Manipulation With Collaborative Trajectory Control2025
- Fulldit2: Efficient In-context Conditioning For Video Diffusion Transformers2025
- Molreflect: Towards In-context Fine-grained Alignments Between Molecules And Texts2026
Topics