Learning Cross-modal Deep Embeddings For Multi-object Image Retrieval Using Text And Sketch
2018 Β· Sounak Dey, Anjan Dutta, Suman K. Ghosh, et al.
Abstract
In this work we introduce a cross modal image retrieval system that allows both text and sketch as input modalities for the query. A cross-modal deep network architecture is formulated to jointly model the sketch and text input modalities as well as the the image output modality, learning a common embedding between text and images and between sketches and images. In addition, an attention model is used to selectively focus the attention on the different objects of the image, allowing for retrieval with multiple objects in the query. Experiments show that the proposed method performs the best in both single and multiple object image retrieval in standard datasets.
Authors
(none)
Tags
Stats
Related papers
- You'll Never Walk Alone: A Sketch And Text Duet For Fine-grained Image Retrieval (2024)9.41
- Cross-modal Hierarchical Modelling For Fine-grained Sketch Based Image Retrieval (2020)6.77
- Revisiting Cross Modal Retrieval (2018)0.00
- Cross-modal Fusion Distillation For Fine-grained Sketch-based Image Retrieval (2022)2.68
- A Sketch Is Worth A Thousand Words: Image Retrieval With Text And Sketch (2022)10.35
- Sketch And Text Synergy: Fusing Structural Contours And Descriptive Attributes For Fine-grained Image Retrieval (2026)0.00
- Image Search Using Multilingual Texts: A Cross-modal Learning Approach Between Image And Text (2019)0.00
- Cross-modal Subspace Learning For Fine-grained Sketch-based Image Retrieval (2017)13.34