Ambiguity-restrained Text-video Representation Learning For Partially Relevant Video Retrieval
2025 Β· Ch Cho, Wj Moon, W Jun, et al.
Abstract
Partially Relevant Video Retrieval~(PRVR) aims to retrieve a video where a specific segment is relevant to a given text query. Typical training processes of PRVR assume a one-to-one relationship where each text query is relevant to only one video. However, we point out the inherent ambiguity between text and video content based on their conceptual scope and propose a framework that incorporates this ambiguity into the model learning process. Specifically, we propose Ambiguity-Restrained representation Learning~(ARL) to address ambiguous text-video pairs. Initially, ARL detects ambiguous pairs based on two criteria: uncertainty and similarity. Uncertainty represents whether instances include commonly shared context across the dataset, while similarity indicates pair-wise semantic overlap. Then, with the detected ambiguous pairs, our ARL hierarchically learns the semantic relationship via multi-positive contrastive learning and dual triplet margin loss. Additionally, we delve into fine-g
Authors
(none)
Tags
Stats
Related papers
- PRVR: Partially Relevant Video Retrieval (2022)2.26
- Uneven Event Modeling For Partially Relevant Video Retrieval (2025)1.40
- Mitigating Semantic Collapse In Partially Relevant Video Retrieval (2025)0.00
- Dual Learning With Dynamic Knowledge Distillation And Soft Alignment For Partially Relevant Video Retrieval (2025)2.60
- Imagine Before Concentration: Diffusion-guided Registers Enhance Partially Relevant Video Retrieval (2026)3.80
- UATVR: Uncertainty-adaptive Text-video Retrieval (2023)15.46
- Gmmformer: Gaussian-mixture-model Based Transformer For Efficient Partially Relevant Video Retrieval (2023)12.06
- Hlformer: Enhancing Partially Relevant Video Retrieval With Hyperbolic Learning (2025)3.50