Yansong Tang
16 papers · 5 citations
Most-cited papers
- LAVT: Language-aware Vision Transformer For Referring Image Segmentation2021 · 335 citations
- Scalablevit: Rethinking The Context-oriented Generalization Of Vision Transformer2022 · 45 citations
- Learning From Temporal Spatial Cubism For Cross-dataset Skeleton-based Action Recognition2022 · 20 citations
- MADTP: Multimodal Alignment-guided Dynamic Token Pruning For Accelerating Vision-language Transformer2024 · 17 citations
- Atp-llava: Adaptive Token Pruning For Large Vision Language Models2024 · 14 citations
- SAM2-LOVE: Segment Anything Model 2 In Language-aided Audio-visual Scenes2025 · 4 citations
- FADE: Frequency-aware Diffusion Model Factorization For Video Editing2025 · 1 citations
- Flash-vstream: Efficient Real-time Understanding For Long Video Streams2025
- Meta-cot: Enhancing Granularity And Generalization In Image Editing2026
Topics