Coarse To Fine: Video Retrieval Before Moment Localization
2021 Β· Zijian Gao, Huanyu Liu, Jingyu Liu
Abstract
The current state-of-the-art methods for video corpus moment retrieval (VCMR) often use similarity-based feature alignment approach for the sake of convenience and speed. However, late fusion methods like cosine similarity alignment are unable to make full use of the information from both query texts and videos. In this paper, we combine feature alignment with feature fusion to promote the performance on VCMR.
Authors
(none)
Tags
Stats
Related papers
- Video Corpus Moment Retrieval With Contrastive Learning (2021)14.35
- Improving Video Corpus Moment Retrieval With Partial Relevance Enhancement (2024)7.89
- Frame-wise Cross-modal Matching For Video Moment Retrieval (2020)13.17
- Vlanet: Video-language Alignment Network For Weakly-supervised Video Moment Retrieval (2020)13.28
- Towards Balanced Alignment: Modal-enhanced Semantic Modeling For Video Moment Retrieval (2023)14.33
- Context-enhanced Video Moment Retrieval With Large Language Models (2024)5.84
- Audio Does Matter: Importance-aware Multi-granularity Fusion For Video Moment Retrieval (2025)4.49
- Viseret: A Simple Yet Effective Approach To Moment Retrieval Via Fine-grained Video Segmentation (2021)0.00