Mitigating Semantic Collapse In Partially Relevant Video Retrieval
2025 Β· Wonjun Moon, Minseok Jung, Gilhan Park, et al.
Abstract
Partially Relevant Video Retrieval (PRVR) seeks videos where only part of the content matches a text query. Existing methods treat every annotated text-video pair as a positive and all others as negatives, ignoring the rich semantic variation both within a single video and across different videos. Consequently, embeddings of both queries and their corresponding video-clip segments for distinct events within the same video collapse together, while embeddings of semantically similar queries and segments from different videos are driven apart. This limits retrieval performance when videos contain multiple, diverse events. This paper addresses the aforementioned problems, termed as semantic collapse, in both the text and video embedding spaces. We first introduce Text Correlation Preservation Learning, which preserves the semantic relationships encoded by the foundation model across text queries. To address collapse in video embeddings, we propose Cross-Branch Video Alignment (CBVA), a con
Authors
(none)
Tags
Stats
Related papers
- Ambiguity-restrained Text-video Representation Learning For Partially Relevant Video Retrieval (2025)5.84
- PRVR: Partially Relevant Video Retrieval (2022)2.26
- Uneven Event Modeling For Partially Relevant Video Retrieval (2025)1.40
- Imagine Before Concentration: Diffusion-guided Registers Enhance Partially Relevant Video Retrieval (2026)3.80
- Mamfusion: Multi-mamba With Temporal Fusion For Partially Relevant Video Retrieval (2025)1.69
- Hlformer: Enhancing Partially Relevant Video Retrieval With Hyperbolic Learning (2025)3.50
- Gmmformer: Gaussian-mixture-model Based Transformer For Efficient Partially Relevant Video Retrieval (2023)12.06
- Prototypes Are Balanced Units For Efficient And Effective Partially Relevant Video Retrieval (2025)0.00