Prompt-aware Of Frame Sampling For Efficient Text-video Retrieval
2025 Β· Deyu Zhang, Tingting Long, Jinrui Zhang, et al.
Abstract
Enabling efficient text-video retrieval on edge-end devices is critical for real-world applications. Yet, existing methods face a critical challenge in balancing accuracy and computational efficiency: uniform frame sampling methods ensure content coverage but incur prohibitive computational costs, while salient-frame sampling methods reduce overhead but suffer from query-agnostic frame selection that biases retrieval results. To address this, we propose ProCLIP, a user-centric framework that achieves state-of-the-art accuracy with significantly improved efficiency. We design a prompt-aware frame sampling strategy that dynamically guides lightweight feature extractors using textual prompts to select semantically relevant frames, overcoming the limitations of existing salient-frame sampling methods which rely on static, query-agnostic selection criteria. Moreover, we adopt a two-stage candidate pruning strategy that combines rapid coarse filtering via a lightweight module with CLIP-power
Authors
(none)
Tags
Stats
Related papers
- Prompt Switch: Efficient CLIP Adaptation For Text-video Retrieval (2023)11.93
- Frame-difference Guided Dynamic Region Perception For CLIP Adaptation In Text-video Retrieval (2025)0.00
- Focus, Distinguish, And Prompt: Unleashing CLIP For Efficient And Flexible Scene Text Retrieval (2024)8.80
- An Empirical Study Of Excitation And Aggregation Design Adaptions In Clip4clip For Video-text Retrieval (2024)4.52
- Clip4clip: An Empirical Study Of CLIP For End To End Video Clip Retrieval (2021)6.02
- Teachclip: Multi-grained Teaching For Efficient Text-to-video Retrieval (2023)0.00
- Centerclip: Token Clustering For Efficient Text-video Retrieval (2022)15.54
- Efficient Cross-modal Video Retrieval With Meta-optimized Frames (2022)7.16