Abstract

Existing Moment retrieval (MR) methods focus on Single-Moment Retrieval (SMR). However, one query can correspond to multiple relevant moments in real-world applications. This makes the existing datasets and methods insufficient for video temporal grounding. By revisiting the gap between current MR tasks and real-world applications, we introduce a high-quality datasets called QVHighlights Multi-Moment Dataset (QV-M\(^2\)), along with new evaluation metrics tailored for multi-moment retrieval (MMR). QV-M\(^2\) consists of 2,212 annotations covering 6,384 video segments. Building on existing efforts in MMR, we propose a framework called FlashMMR. Specifically, we propose a Multi-moment Post-verification module to refine the moment boundaries. We introduce constrained temporal adjustment and subsequently leverage a verification module to re-evaluate the candidate segments. Through this sophisticated filtering pipeline, low-confidence proposals are pruned, and robust multi-moment alignment

Authors

(none)

Tags

  • Image Retrieval

Stats

  • citations0
  • S2 citationsβ€”
  • github stars7
  • HF likes0
  • heat score1.81
  • arxiv keycao2025when

Related papers