← authors · overview

Xinyu Wang

19 papers · 6 citations

Most-cited papers

Can Large Vision-language Models Understand Multimodal Sarcasm?
2025 · 4 citations
Compute Only 16 Tokens In One Timestep: Accelerating Diffusion Transformers With Cluster-driven Feature Caching
2025 · 1 citations
P2MFDS: A Privacy-preserving Multimodal Fall Detection System For Elderly People In Bathroom Environments
2025 · 1 citations
Controllable Video Generation: A Survey
2025
Qwen3 Technical Report
2025
Hybridtm: Combining Transformer And Mamba For 3D Semantic Segmentation
2025
Proxywar: Dynamic Assessment Of LLM Code Generation In Game Arenas
2026

Topics

Uncategorized Video-Language Vision-Language Models Code Agents Evaluation Benchmarks Multi-Agent