Prototypes Are Balanced Units For Efficient And Effective Partially Relevant Video Retrieval
2025 Β· Wonjun Moon, Cheol-Ho Cho, Woojin Jun, et al.
Abstract
In a retrieval system, simultaneously achieving search accuracy and efficiency is inherently challenging. This challenge is particularly pronounced in partially relevant video retrieval (PRVR), where incorporating more diverse context representations at varying temporal scales for each video enhances accuracy but increases computational and memory costs. To address this dichotomy, we propose a prototypical PRVR framework that encodes diverse contexts within a video into a fixed number of prototypes. We then introduce several strategies to enhance text association and video understanding within the prototypes, along with an orthogonal objective to ensure that the prototypes capture a diverse range of content. To keep the prototypes searchable via text queries while accurately encoding video contexts, we implement cross- and uni-modal reconstruction tasks. The cross-modal reconstruction task aligns the prototypes with textual features within a shared space, while the uni-modal reconstruc
Authors
(none)
Tags
Stats
Related papers
- PRVR: Partially Relevant Video Retrieval (2022)2.26
- Uneven Event Modeling For Partially Relevant Video Retrieval (2025)1.40
- Ambiguity-restrained Text-video Representation Learning For Partially Relevant Video Retrieval (2025)5.84
- Text-adaptive Multiple Visual Prototype Matching For Video-text Retrieval (2022)4.52
- Gmmformer: Gaussian-mixture-model Based Transformer For Efficient Partially Relevant Video Retrieval (2023)12.06
- Propy: Building Interactive Prompt Pyramids Upon CLIP For Partially Relevant Video Retrieval (2025)1.91
- Imagine Before Concentration: Diffusion-guided Registers Enhance Partially Relevant Video Retrieval (2026)3.80
- Mitigating Semantic Collapse In Partially Relevant Video Retrieval (2025)0.00