Who Can We Trust? Scope-aware Video Moment Retrieval With Multi-agent Conflict
2025 Β· Chaochen Wu, Guan Luo, Meiyun Zuo, et al.
Abstract
Video moment retrieval uses a text query to locate a moment from a given untrimmed video reference. Locating corresponding video moments with text queries helps people interact with videos efficiently. Current solutions for this task have not considered conflict within location results from different models, so various models cannot integrate correctly to produce better results. This study introduces a reinforcement learning-based video moment retrieval model that can scan the whole video once to find the moment's boundary while producing its locational evidence. Moreover, we proposed a multi-agent system framework that can use evidential learning to resolve conflicts between agents' localization output. As a side product of observing and dealing with conflicts between agents, we can decide whether a query has no corresponding moment in a video (out-of-scope) without additional training, which is suitable for real-world applications. Extensive experiments on benchmark datasets show the
Authors
(none)
Tags
Stats
Related papers
- Towards Efficient And Robust Moment Retrieval System: A Unified Framework For Multi-granularity Models And Temporal Reranking (2025)2.26
- Viseret: A Simple Yet Effective Approach To Moment Retrieval Via Fine-grained Video Segmentation (2021)0.00
- Llandmark: A Multi-agent Framework For Landmark-aware Multimodal Interactive Video Retrieval (2026)0.00
- Frame-wise Cross-modal Matching For Video Moment Retrieval (2020)13.17
- Unified Interactive Multimodal Moment Retrieval Via Cascaded Embedding-reranking And Temporal-aware Score Fusion (2025)0.00
- Video Moment Retrieval With Text Query Considering Many-to-many Correspondence Using Potentially Relevant Pair (2021)0.00
- Hybrid-learning Video Moment Retrieval Across Multi-domain Labels (2024)0.00
- Momentseeker: A Task-oriented Benchmark For Long-video Moment Retrieval (2025)0.00