Dual Learning With Dynamic Knowledge Distillation And Soft Alignment For Partially Relevant Video Retrieval
2025 Β· Jianfeng Dong, Lei Huang, Daizong Liu, et al.
Abstract
Almost all previous text-to-video retrieval works ideally assume that videos are pre-trimmed with short durations containing solely text-related content. However, in practice, videos are typically untrimmed in long durations with much more complicated background content. Therefore, in this paper, we focus on the more practical yet challenging task of Partially Relevant Video Retrieval (PRVR), which aims to retrieve partially relevant untrimmed videos with the given query. To tackle this task, we propose a novel framework that distills generalization knowledge from a powerful large-scale vision-language pre-trained model and transfers it to a lightweight, task-specific PRVR network. Specifically, we introduce a Dual Learning framework with Dynamic Knowledge Distillation (DL-DKD++), where a large teacher model provides supervision to a compact dual-branch student network. The student model comprises two branches: an inheritance branch that absorbs transferable knowledge from the teacher,
Authors
(none)
Tags
Stats
Related papers
- PRVR: Partially Relevant Video Retrieval (2022)2.26
- Ambiguity-restrained Text-video Representation Learning For Partially Relevant Video Retrieval (2025)5.84
- Hlformer: Enhancing Partially Relevant Video Retrieval With Hyperbolic Learning (2025)3.50
- TEACHTEXT: Crossmodal Generalized Distillation For Text-video Retrieval (2021)15.43
- Imagine Before Concentration: Diffusion-guided Registers Enhance Partially Relevant Video Retrieval (2026)3.80
- C2KD: Cross-lingual Cross-modal Knowledge Distillation For Multilingual Text-video Retrieval (2022)8.94
- Mitigating Semantic Collapse In Partially Relevant Video Retrieval (2025)0.00
- Gmmformer: Gaussian-mixture-model Based Transformer For Efficient Partially Relevant Video Retrieval (2023)12.06