Object-centric Framework For Video Moment Retrieval
2025 Β· Zongyao Li, Yongkang Wong, Satoshi Yamazaki, et al.
Abstract
Most existing video moment retrieval methods rely on temporal sequences of frame- or clip-level features that primarily encode global visual and semantic information. However, such representations often fail to capture fine-grained object semantics and appearance, which are crucial for localizing moments described by object-oriented queries involving specific entities and their interactions. In particular, temporal dynamics at the object level have been largely overlooked, limiting the effectiveness of existing approaches in scenarios requiring detailed object-level reasoning. To address this limitation, we propose a novel object-centric framework for moment retrieval. Our method first extracts query-relevant objects using a scene graph parser and then generates scene graphs from video frames to represent these objects and their relationships. Based on the scene graphs, we construct object-level feature sequences that encode rich visual and semantic information. These sequences are pro
Authors
(none)
Tags
Stats
Related papers
- A Lightweight Moment Retrieval System With Global Re-ranking And Robust Adaptive Bidirectional Temporal Search (2025)3.58
- Towards Efficient And Robust Moment Retrieval System: A Unified Framework For Multi-granularity Models And Temporal Reranking (2025)2.26
- Frame-wise Cross-modal Matching For Video Moment Retrieval (2020)13.17
- Momentseeker: A Task-oriented Benchmark For Long-video Moment Retrieval (2025)0.00
- Viseret: A Simple Yet Effective Approach To Moment Retrieval Via Fine-grained Video Segmentation (2021)0.00
- LOVO: Efficient Complex Object Query In Large-scale Video Datasets (2025)2.26
- Object Priors For Classifying And Localizing Unseen Actions (2021)9.41
- Multimodal Contextualized Support For Enhancing Video Retrieval System (2026)0.00