Cross-modal Food Retrieval: Learning A Joint Embedding Of Food Images And Recipes With Semantic Consistency And Attention Mechanism
2020 Β· Hao Wang, Doyen Sahoo, Chenghao Liu, et al.
Abstract
Food retrieval is an important task to perform analysis of food-related information, where we are interested in retrieving relevant information about the queried food item such as ingredients, cooking instructions, etc. In this paper, we investigate cross-modal retrieval between food images and cooking recipes. The goal is to learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. Two major challenges in addressing this problem are 1) large intra-variance and small inter-variance across cross-modal food data; and 2) difficulties in obtaining discriminative recipe representations. To address these two problems, we propose Semantic-Consistent and Attention-based Networks (SCAN), which regularize the embeddings of the two modalities through aligning output semantic probabilities. Besides, we exploit a self-attention mechanism to improve the embedding of recipes. We evaluate the performance of the pro
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Retrieval In The Cooking Context: Learning Semantic Text-image Embeddings (2018)0.00
- CHEF: Cross-modal Hierarchical Embeddings For Food Domain Retrieval (2021)8.35
- MCEN: Bridging Cross-modal Gap Between Cooking Recipes And Dish Images With Latent Variable Model (2020)13.39
- SIMMER: Cross-modal Food Image--recipe Retrieval Via Mllm-based Embedding (2026)0.00
- Learning TFIDF Enhanced Joint Embedding For Recipe-image Cross-modal Retrieval Service (2021)10.85
- Transformer Decoders With Multimodal Regularization For Cross-modal Food Retrieval (2022)14.17
- Recipe1m+: A Dataset For Learning Cross-modal Embeddings For Cooking Recipes And Food Images (2018)17.24
- Images & Recipes: Retrieval In The Cooking Context (2018)3.58