Query By Activity Video In The Wild
2023 Β· Tao Hu, William Thong, Pascal Mettes, et al.
Abstract
This paper focuses on activity retrieval from a video query in an imbalanced scenario. In current query-by-activity-video literature, a common assumption is that all activities have sufficient labelled examples when learning an embedding. This assumption does however practically not hold, as only a portion of activities have many examples, while other activities are only described by few examples. In this paper, we propose a visual-semantic embedding network that explicitly deals with the imbalanced scenario for activity retrieval. Our network contains two novel modules. The visual alignment module performs a global alignment between the input video and fixed-sized visual bank representations for all activities. The semantic module performs an alignment between the input video and fixed-sized semantic activity representations. By matching videos with both visual and semantic activity representations that are of equal size over all activities, we no longer ignore infrequent activities d
Authors
(none)
Tags
Stats
Related papers
- Multiple Visual-semantic Embedding For Video Retrieval From Query Sentence (2020)2.26
- Multilevel Language And Vision Integration For Text-to-clip Retrieval (2018)17.67
- Modality-balanced Embedding For Video Retrieval (2022)7.16
- Use What You Have: Video Retrieval Using Representations From Collaborative Experts (2019)0.00
- Multi-focused Video Group Activities Hashing (2025)0.00
- Encode The Unseen: Predictive Video Hashing For Scalable Mid-stream Retrieval (2020)3.58
- Dual Encoding For Video Retrieval By Text (2020)16.05
- Learning Joint Representations Of Videos And Sentences With Web Image Search (2016)12.93