Li Yuan
19 papers · 5 citations
Most-cited papers
- Tokens-to-token Vit: Training Vision Transformers From Scratch On Imagenet2021 · 1881 citations
- Video-llava: Learning United Visual Representation By Alignment Before Projection2023 · 1441 citations
- Languagebind: Extending Video-language Pretraining To N-modality By Language-based Semantic Alignment2023 · 398 citations
- LLM Lies: Hallucinations Are Not Bugs, But Features As Adversarial Examples2023 · 296 citations
- VOLO: Vision Outlooker For Visual Recognition2021 · 157 citations
- Chat-univi: Unified Visual Representation Empowers Large Language Models With Image And Video Understanding2023 · 135 citations
- Video-bench: A Comprehensive Benchmark And Toolkit For Evaluating Video-based Large Language Models2023 · 105 citations
- LOOK-M: Look-once Optimization In KV Cache For Efficient Multimodal Long-context Inference2024 · 84 citations
- Viewcrafter: Taming Video Diffusion Models For High-fidelity Novel View Synthesis2024 · 50 citations
- LOOK-M: Look-once Optimization In KV Cache For Efficient Multimodal Long-context Inference2024 · 11 citations
- Collaborative Multi-lora Experts With Achievement-based Multi-tasks Loss For Unified Multimodal Information Extraction2025 · 3 citations
- E-4DGS: High-fidelity Dynamic Reconstruction From The Multi-view Event Cameras2025 · 2 citations
- Epona: Autoregressive Diffusion World Model For Autonomous Driving2025
- Imgedit: A Unified Image Editing Dataset And Benchmark2025
- Does Understanding Inform Generation In Unified Multimodal Models? From Analysis To Path Forward2025
Topics