On Semantic Similarity In Video Retrieval
2021 Β· Michael Wray, Hazel Doughty, Dima Damen
Abstract
Current video retrieval efforts all found their evaluation on an instance-based assumption, that only a single caption is relevant to a query video and vice versa. We demonstrate that this assumption results in performance comparisons often not indicative of models' retrieval capabilities. We propose a move to semantic similarity video retrieval, where (i) multiple videos/captions can be deemed equally relevant, and their relative ranking does not affect a method's reported performance and (ii) retrieved videos/captions are ranked by their similarity to a query. We propose several proxies to estimate semantic similarities in large-scale retrieval datasets, without additional annotations. Our analysis is performed on three commonly used video retrieval datasets (MSR-VTT, YouCook2 and EPIC-KITCHENS).
Authors
(none)
Tags
Stats
Related papers
- Semantic Video Moments Retrieval At Scale: A New Task And A Baseline (2022)0.00
- Learning Video Retrieval Models With Relevance-aware Online Mining (2022)6.07
- Multiple Visual-semantic Embedding For Video Retrieval From Query Sentence (2020)2.26
- Convis-bench: Estimating Video Similarity Through Semantic Concepts (2025)0.00
- ICSVR: Investigating Compositional And Syntactic Understanding In Video Retrieval Models (2023)8.92
- Exploiting Semantic Role Contextualized Video Features For Multi-instance Text-video Retrieval EPIC-KITCHENS-100 Multi-instance Retrieval Challenge 2022 (2022)0.00
- Multimodal Lengthy Videos Retrieval Framework And Evaluation Metric (2025)0.00
- Fighting Fire With FIRE: Assessing The Validity Of Text-to-video Retrieval Benchmarks (2022)0.00