Momentseeker: A Task-oriented Benchmark For Long-video Moment Retrieval
2025 Β· Huaying Yuan, Jian Ni, Zheng Liu, et al.
Abstract
Accurately locating key moments within long videos is crucial for solving long video understanding (LVU) tasks. However, existing benchmarks are either severely limited in terms of video length and task diversity, or they focus solely on the end-to-end LVU performance, making them inappropriate for evaluating whether key moments can be accurately accessed. To address this challenge, we propose MomentSeeker, a novel benchmark for long-video moment retrieval (LMVR), distinguished by the following features. First, it is created based on long and diverse videos, averaging over 1200 seconds in duration and collected from various domains, e.g., movie, anomaly, egocentric, and sports. Second, it covers a variety of real-world scenarios in three levels: global-level, event-level, object-level, covering common tasks like action recognition, object localization, and causal reasoning, etc. Third, it incorporates rich forms of queries, including text-only queries, image-conditioned queries, and vi
Authors
(none)
Tags
Stats
Related papers
- Lovr: A Benchmark For Long Video Retrieval In Multimodal Contexts (2025)0.00
- MUVR: A Multi-modal Untrimmed Video Retrieval Benchmark With Multi-level Visual Correspondence (2025)1.40
- When One Moment Isn't Enough: Multi-moment Retrieval With Cross-moment Interactions (2025)1.81
- Semantic Video Moments Retrieval At Scale: A New Task And A Baseline (2022)0.00
- A Lightweight Moment Retrieval System With Global Re-ranking And Robust Adaptive Bidirectional Temporal Search (2025)3.58
- Viseret: A Simple Yet Effective Approach To Moment Retrieval Via Fine-grained Video Segmentation (2021)0.00
- Context-enhanced Video Moment Retrieval With Large Language Models (2024)5.84
- Frame-wise Cross-modal Matching For Video Moment Retrieval (2020)13.17