CHEF: Cross-modal Hierarchical Embeddings For Food Domain Retrieval
2021 Β· Hai X. Pham, Ricardo Guerrero, Jiatong Li, et al.
Abstract
Despite the abundance of multi-modal data, such as image-text pairs, there has been little effort in understanding the individual entities and their different roles in the construction of these data instances. In this work, we endeavour to discover the entities and their corresponding importance in cooking recipes automaticall\} as a visual-linguistic association problem. More specifically, we introduce a novel cross-modal learning framework to jointly model the latent representations of images and text in the food image-recipe association and retrieval tasks. This model allows one to discover complex functional and hierarchical relationships between images and text, and among textual parts of a recipe including title, ingredients and cooking instructions. Our experiments show that by making use of efficient tree-structured Long Short-Term Memory as the text encoder in our computational cross-modal retrieval framework, we are not only able to identify the main ingredients and cooking a
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Retrieval In The Cooking Context: Learning Semantic Text-image Embeddings (2018)0.00
- Revamping Cross-modal Recipe Retrieval With Hierarchical Transformers And Self-supervised Learning (2021)13.97
- Cross-modal Food Retrieval: Learning A Joint Embedding Of Food Images And Recipes With Semantic Consistency And Attention Mechanism (2020)12.10
- SIMMER: Cross-modal Food Image--recipe Retrieval Via Mllm-based Embedding (2026)0.00
- MCEN: Bridging Cross-modal Gap Between Cooking Recipes And Dish Images With Latent Variable Model (2020)13.39
- Transformer Decoders With Multimodal Regularization For Cross-modal Food Retrieval (2022)14.17
- Recipe1m+: A Dataset For Learning Cross-modal Embeddings For Cooking Recipes And Food Images (2018)17.24
- Learning TFIDF Enhanced Joint Embedding For Recipe-image Cross-modal Retrieval Service (2021)10.85