Uncertainty-based Cross-modal Retrieval With Probabilistic Representations
2022 Β· Leila Pishdad, Ran Zhang, Konstantinos G. Derpanis, et al.
Abstract
Probabilistic embeddings have proven useful for capturing polysemous word meanings, as well as ambiguity in image matching. In this paper, we study the advantages of probabilistic embeddings in a cross-modal setting (i.e., text and images), and propose a simple approach that replaces the standard vector point embeddings in extant image-text matching models with probabilistic distributions that are parametrically learned. Our guiding hypothesis is that the uncertainty encoded in the probabilistic embeddings captures the cross-modal ambiguity in the input instances, and that it is through capturing this uncertainty that the probabilistic models can perform better at downstream tasks, such as image-to-text or text-to-image retrieval. Through extensive experiments on standard and new benchmarks, we show a consistent advantage for probabilistic representations in cross-modal retrieval, and validate the ability of our embeddings to capture uncertainty.
Authors
(none)
Tags
Stats
Related papers
- Probabilistic Embeddings For Cross-modal Retrieval (2021)21.70
- Ranking-aware Uncertainty For Text-guided Image Retrieval (2023)0.00
- Bayesian Triplet Loss: Uncertainty Quantification In Image Retrieval (2020)11.49
- Probabilistic Compositional Embeddings For Multimodal Image Retrieval (2022)13.80
- Exploring Uncertainty In Conditional Multi-modal Retrieval Systems (2019)0.00
- Prototype-based Aleatoric Uncertainty Quantification For Cross-modal Retrieval (2023)6.50
- Heterogeneous Uncertainty-guided Composed Image Retrieval With Fine-grained Probabilistic Learning (2026)0.00
- Look, Imagine And Match: Improving Textual-visual Cross-modal Retrieval With Generative Models (2017)18.52