TF-CLIP: Learning Text-free CLIP For Video-based Person Re-identification
2023 Β· Chenyang Yu, Xuehu Liu, Yingquan Wang, et al.
Abstract
Large-scale language-image pre-trained models (e.g., CLIP) have shown superior performances on many cross-modal retrieval tasks. However, the problem of transferring the knowledge learned from such models to video-based person re-identification (ReID) has barely been explored. In addition, there is a lack of decent text descriptions in current ReID benchmarks. To address these issues, in this work, we propose a novel one-stage text-free CLIP-based learning framework named TF-CLIP for video-based person ReID. More specifically, we extract the identity-specific sequence feature as the CLIP-Memory to replace the text feature. Meanwhile, we design a Sequence-Specific Prompt (SSP) module to update the CLIP-Memory online. To capture temporal information, we further propose a Temporal Memory Diffusion (TMD) module, which consists of two key components: Temporal Memory Construction (TMC) and Memory Diffusion (MD). Technically, TMC allows the frame-level memories in a sequence to communicate wi
Authors
(none)
Tags
Stats
Related papers
- Prompt Switch: Efficient CLIP Adaptation For Text-video Retrieval (2023)11.93
- Clip2video: Mastering Video-text Retrieval Via Image CLIP (2021)0.00
- Clip4clip: An Empirical Study Of CLIP For End To End Video Clip Retrieval (2021)6.02
- Fine-tuned CLIP Models Are Efficient Video Learners (2022)21.57
- Revisiting Temporal Modeling For Clip-based Image-to-video Knowledge Transferring (2023)17.40
- TVPR: Text-to-video Person Retrieval And A New Benchmark (2023)2.26
- Videoclip-xl: Advancing Long Description Understanding For Video CLIP Models (2024)8.35
- Frame-difference Guided Dynamic Region Perception For CLIP Adaptation In Text-video Retrieval (2025)0.00