Is Cross-modal Information Retrieval Possible Without Training?
2023 Β· Hyunjin Choi, Hyunjae Lee, Seongho Joe, et al.
Abstract
Encoded representations from a pretrained deep learning model (e.g., BERT text embeddings, penultimate CNN layer activations of an image) convey a rich set of features beneficial for information retrieval. Embeddings for a particular modality of data occupy a high-dimensional space of its own, but it can be semantically aligned to another by a simple mapping without training a deep neural net. In this paper, we take a simple mapping computed from the least squares and singular value decomposition (SVD) for a solution to the Procrustes problem to serve a means to cross-modal information retrieval. That is, given information in one modality such as text, the mapping helps us locate a semantically equivalent data item in another modality such as image. Using off-the-shelf pretrained deep learning models, we have experimented the aforementioned simple cross-modal mappings in tasks of text-to-image and image-to-text retrieval. Despite simplicity, our mappings perform reasonably well reachin
Authors
(none)
Tags
Stats
Related papers
- Do Cross Modal Systems Leverage Semantic Relationships? (2019)7.16
- Multimodal Representation Alignment For Cross-modal Information Retrieval (2025)0.00
- Revisiting Cross Modal Retrieval (2018)0.00
- Towards Efficient Cross-modal Visual Textual Retrieval Using Transformer-encoder Deep Features (2021)6.34
- Adversarial Cross-modal Retrieval Via Learning And Transferring Single-modal Similarities (2019)8.60
- Cross-modal Image Retrieval With Deep Mutual Information Maximization (2021)9.59
- End-to-end Cross-modality Retrieval With CCA Projections And Pairwise Ranking Loss (2017)14.68
- Do Neural Network Cross-modal Mappings Really Bridge Modalities? (2018)4.52