Scenetrilogy: On Human Scene-sketch And Its Complementarity With Photo And Text
2022 Β· Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Aneeshan Sain, et al.
Abstract
In this paper, we extend scene understanding to include that of human sketch. The result is a complete trilogy of scene representation from three diverse and complementary modalities -- sketch, photo, and text. Instead of learning a rigid three-way embedding and be done with it, we focus on learning a flexible joint embedding that fully supports the ``optionality" that this complementarity brings. Our embedding supports optionality on two axes: (i) optionality across modalities -- use any combination of modalities as query for downstream tasks like retrieval, (ii) optionality across tasks -- simultaneously utilising the embedding for either discriminative (e.g., retrieval) or generative tasks (e.g., captioning). This provides flexibility to end-users by exploiting the best of each modality, therefore serving the very purpose behind our proposal of a trilogy in the first place. First, a combination of information-bottleneck and conditional invertible neural networks disentangle the moda
Authors
(none)
Tags
Stats
Related papers
- Sketchtriplet: Self-supervised Scenarized Sketch-text-image Triplet Generation (2024)4.52
- Partially Does It: Towards Scene-level FG-SBIR With Partial Input (2022)10.97
- Back To The Drawing Board: Rethinking Scene-level Sketch-based Image Retrieval (2025)0.00
- Scene Designer: A Unified Model For Scene Search And Synthesis From Sketch (2021)5.84
- Beyond Visual Semantics: Exploring The Role Of Scene Text In Image Understanding (2019)9.59
- Scenarios: A New Representation For Complex Scene Understanding (2018)0.00
- You'll Never Walk Alone: A Sketch And Text Duet For Fine-grained Image Retrieval (2024)9.41
- Triplet-aware Scene Graph Embeddings (2019)7.81