Egocvr: An Egocentric Benchmark For Fine-grained Composed Video Retrieval
2024 Β· Thomas Hummel, Shyamgopal Karthik, Mariana-Iuliana Georgescu, et al.
Abstract
In Composed Video Retrieval, a video and a textual description which modifies the video content are provided as inputs to the model. The aim is to retrieve the relevant video with the modified content from a database of videos. In this challenging task, the first step is to acquire large-scale training datasets and collect high-quality benchmarks for evaluation. In this work, we introduce EgoCVR, a new evaluation benchmark for fine-grained Composed Video Retrieval using large-scale egocentric video datasets. EgoCVR consists of 2,295 queries that specifically focus on high-quality temporal video understanding. We find that existing Composed Video Retrieval frameworks do not achieve the necessary high-quality temporal video understanding for this task. To address this shortcoming, we adapt a simple training-free method, propose a generic re-ranking framework for Composed Video Retrieval, and demonstrate that this achieves strong results on EgoCVR. Our code and benchmark are freely availa
Authors
(none)
Tags
Stats
Related papers
- Composed Video Retrieval Via Enriched Context And Discriminative Embeddings (2024)12.19
- From Play To Replay: Composed Video Retrieval For Temporally Fine-grained Videos (2025)0.00
- Covr-r:reason-aware Composed Video Retrieval (2026)2.02
- Beyond Simple Edits: Composed Video Retrieval With Dense Modifications (2025)2.16
- ICSVR: Investigating Compositional And Syntactic Understanding In Video Retrieval Models (2023)8.92
- Retrieval-augmented Egocentric Video Captioning (2024)11.29
- PREGEN: Uncovering Latent Thoughts In Composed Video Retrieval (2026)0.00
- Lovr: A Benchmark For Long Video Retrieval In Multimodal Contexts (2025)0.00