Hlformer: Enhancing Partially Relevant Video Retrieval With Hyperbolic Learning
2025 Β· Jun Li, Jinpeng Wang, Chaolei Tan, et al.
Abstract
Partially Relevant Video Retrieval (PRVR) addresses the critical challenge of matching untrimmed videos with text queries describing only partial content. Existing methods suffer from geometric distortion in Euclidean space that sometimes misrepresents the intrinsic hierarchical structure of videos and overlooks certain hierarchical semantics, ultimately leading to suboptimal temporal modeling. To address this issue, we propose the first hyperbolic modeling framework for PRVR, namely HLFormer, which leverages hyperbolic space learning to compensate for the suboptimal hierarchical modeling capabilities of Euclidean space. Specifically, HLFormer integrates the Lorentz Attention Block and Euclidean Attention Block to encode video embeddings in hybrid spaces, using the Mean-Guided Adaptive Interaction Module to dynamically fuse features. Additionally, we introduce a Partial Order Preservation Loss to enforce "text < video" hierarchy through Lorentzian cone constraints. This approach furthe
Authors
(none)
Tags
Stats
Related papers
- Gmmformer: Gaussian-mixture-model Based Transformer For Efficient Partially Relevant Video Retrieval (2023)12.06
- PRVR: Partially Relevant Video Retrieval (2022)2.26
- Ambiguity-restrained Text-video Representation Learning For Partially Relevant Video Retrieval (2025)5.84
- Dual Learning With Dynamic Knowledge Distillation And Soft Alignment For Partially Relevant Video Retrieval (2025)2.60
- Mitigating Semantic Collapse In Partially Relevant Video Retrieval (2025)0.00
- Uneven Event Modeling For Partially Relevant Video Retrieval (2025)1.40
- Propy: Building Interactive Prompt Pyramids Upon CLIP For Partially Relevant Video Retrieval (2025)1.91
- Prototypes Are Balanced Units For Efficient And Effective Partially Relevant Video Retrieval (2025)0.00