Text-based Localization Of Moments In A Video Corpus
2020 Β· Sudipta Paul, Niluthpol Chowdhury Mithun, Amit K. Roy-Chowdhury
Abstract
Prior works on text-based video moment localization focus on temporally grounding the textual query in an untrimmed video. These works assume that the relevant video is already known and attempt to localize the moment on that relevant video only. Different from such works, we relax this assumption and address the task of localizing moments in a corpus of videos for a given sentence query. This task poses a unique challenge as the system is required to perform: (i) retrieval of the relevant video where only a segment of the video corresponds with the queried sentence, and (ii) temporal localization of moment in the relevant video based on sentence query. Towards overcoming this challenge, we propose Hierarchical Moment Alignment Network (HMAN) which learns an effective joint embedding space for moments and sentences. In addition to learning subtle differences between intra-video moments, HMAN focuses on distinguishing inter-video global semantic concepts based on sentence queries. Quali
Authors
(none)
Tags
Stats
Related papers
- Hanet: Hierarchical Alignment Networks For Video-text Retrieval (2021)0.00
- Frame-wise Cross-modal Matching For Video Moment Retrieval (2020)13.17
- Vlanet: Video-language Alignment Network For Weakly-supervised Video Moment Retrieval (2020)13.28
- Hybrid-learning Video Moment Retrieval Across Multi-domain Labels (2024)0.00
- Video Moment Retrieval With Text Query Considering Many-to-many Correspondence Using Potentially Relevant Pair (2021)0.00
- Disentangle And Denoise: Tackling Context Misalignment For Video Moment Retrieval (2024)0.00
- A Lightweight Moment Retrieval System With Global Re-ranking And Robust Adaptive Bidirectional Temporal Search (2025)3.58
- Logan: Latent Graph Co-attention Network For Weakly-supervised Video Moment Retrieval (2019)13.05