Imagine Before Concentration: Diffusion-guided Registers Enhance Partially Relevant Video Retrieval
2026 Β· Jun Li, Xuhang Lou, Jinpeng Wang, et al.
Abstract
Partially Relevant Video Retrieval (PRVR) aims to retrieve untrimmed videos based on text queries that describe only partial events. Existing methods suffer from incomplete global contextual perception, struggling with query ambiguity and local noise induced by spurious responses. To address these issues, we propose DreamPRVR, which adopts a coarse-to-fine representation learning paradigm. The model first generates global contextual semantic registers as coarse-grained highlights spanning the entire video and then concentrates on fine-grained similarity optimization for precise cross-modal matching. Concretely, these registers are generated by initializing from the video-centric distribution produced by a probabilistic variational sampler and then iteratively refined via a text-supervised truncated diffusion model. During this process, textual semantic structure learning constructs a well-formed textual latent space, enhancing the reliability of global perception. The registers are the
Authors
(none)
Tags
Stats
Related papers
- PRVR: Partially Relevant Video Retrieval (2022)2.26
- Ambiguity-restrained Text-video Representation Learning For Partially Relevant Video Retrieval (2025)5.84
- Uneven Event Modeling For Partially Relevant Video Retrieval (2025)1.40
- Mitigating Semantic Collapse In Partially Relevant Video Retrieval (2025)0.00
- Dual Learning With Dynamic Knowledge Distillation And Soft Alignment For Partially Relevant Video Retrieval (2025)2.60
- Prototypes Are Balanced Units For Efficient And Effective Partially Relevant Video Retrieval (2025)0.00
- Gmmformer: Gaussian-mixture-model Based Transformer For Efficient Partially Relevant Video Retrieval (2023)12.06
- Hlformer: Enhancing Partially Relevant Video Retrieval With Hyperbolic Learning (2025)3.50