VRAG: Region Attention Graphs For Content-based Video Retrieval
2022 Β· Kennard Ng, Ser-Nam Lim, Gim Hee Lee
Abstract
Content-based Video Retrieval (CBVR) is used on media-sharing platforms for applications such as video recommendation and filtering. To manage databases that scale to billions of videos, video-level approaches that use fixed-size embeddings are preferred due to their efficiency. In this paper, we introduce Video Region Attention Graph Networks (VRAG) that improves the state-of-the-art of video-level methods. We represent videos at a finer granularity via region-level features and encode video spatio-temporal dynamics through region-level relations. Our VRAG captures the relationships between regions based on their semantic content via self-attention and the permutation invariant aggregation of Graph Convolution. In addition, we show that the performance gap between video-level and frame-level methods can be reduced by segmenting videos into shots and using shot embeddings for video retrieval. We evaluate our VRAG over several video retrieval tasks and achieve a new state-of-the-art for
Authors
(none)
Tags
Stats
Related papers
- Graph Based Temporal Aggregation For Video Retrieval (2020)0.00
- Learning Audio-guided Video Representation With Gated Attention For Video-text Retrieval (2025)5.24
- RAVU: Retrieval Augmented Video Understanding With Compositional Reasoning Over Graph (2025)0.00
- Temporal Context Aggregation For Video Retrieval With Contrastive Learning (2020)13.23
- VVS: Video-to-video Retrieval With Irrelevant Frame Suppression (2023)9.42
- Viseret: A Simple Yet Effective Approach To Moment Retrieval Via Fine-grained Video Segmentation (2021)0.00
- Fine-grained Video-text Retrieval With Hierarchical Graph Reasoning (2020)18.27
- Channel Recurrent Attention Networks For Video Pedestrian Retrieval (2020)0.00