Video Moment Retrieval With Text Query Considering Many-to-many Correspondence Using Potentially Relevant Pair
2021 Β· Sho Maeoki, Yusuke Mukuta, Tatsuya Harada
Abstract
In this paper we undertake the task of text-based video moment retrieval from a corpus of videos. To train the model, text-moment paired datasets were used to learn the correct correspondences. In typical training methods, ground-truth text-moment pairs are used as positive pairs, whereas other pairs are regarded as negative pairs. However, aside from the ground-truth pairs, some text-moment pairs should be regarded as positive. In this case, one text annotation can be positive for many video moments. Conversely, one video moment can be corresponded to many text annotations. Thus, there are many-to-many correspondences between the text annotations and video moments. Based on these correspondences, we can form potentially relevant pairs, which are not given as ground truth yet are not negative; effectively incorporating such relevant pairs into training can improve the retrieval performance. The text query should describe what is happening in a video moment. Hence, different video momen
Authors
(none)
Tags
Stats
Related papers
- Viseret: A Simple Yet Effective Approach To Moment Retrieval Via Fine-grained Video Segmentation (2021)0.00
- Hybrid-learning Video Moment Retrieval Across Multi-domain Labels (2024)0.00
- Improving Video Corpus Moment Retrieval With Partial Relevance Enhancement (2024)7.89
- Frame-wise Cross-modal Matching For Video Moment Retrieval (2020)13.17
- Multi-query Video Retrieval (2022)9.59
- Text-adaptive Multiple Visual Prototype Matching For Video-text Retrieval (2022)4.52
- Who Can We Trust? Scope-aware Video Moment Retrieval With Multi-agent Conflict (2025)0.00
- Selective Query-guided Debiasing For Video Corpus Moment Retrieval (2022)9.59