Visual Relationship Detection With Language Priors
2016 Β· Cewu Lu, Ranjay Krishna, Michael Bernstein, et al.
Abstract
Visual relationships capture a wide variety of interactions between pairs of objects in images (e.g. "man riding bicycle" and "man pushing bicycle"). Consequently, the set of possible relationships is extremely large and it is difficult to obtain sufficient training examples for all possible relationships. Because of this limitation, previous work on visual relationship detection has concentrated on predicting only a handful of relationships. Though most relationships are infrequent, their objects (e.g. "man" and "bicycle") and predicates (e.g. "riding" and "pushing") independently occur more frequently. We propose a model that uses this insight to train visual models for objects and predicates individually and later combines them together to predict multiple relationships per image. We improve on prior work by leveraging language priors from semantic word embeddings to finetune the likelihood of a predicted relationship. Our model can scale to predict thousands of types of relationshi
Authors
(none)
Tags
Stats
Related papers
- Priorclip: Visual Prior Guided Vision-language Model For Remote Sensing Image-text Retrieval (2024)0.00
- VITR: Augmenting Vision Transformers With Relation-focused Learning For Cross-modal Information Retrieval (2023)4.52
- Tensor Composition Net For Visual Relationship Prediction (2020)0.00
- CAVL: Learning Contrastive And Adaptive Representations Of Vision And Language (2023)0.00
- Leveraging Retrieval-augmented Tags For Large Vision-language Understanding In Complex Scenes (2024)0.00
- Visual Model Checking: Graph-based Inference Of Visual Routines For Image Retrieval (2026)0.00
- Object Priors For Classifying And Localizing Unseen Actions (2021)9.41
- Learning The Visualness Of Text Using Large Vision-language Models (2023)4.52