Learning Program Representations For Food Images And Cooking Recipes
2022 Β· Dim P. Papadopoulos, Enrique Mora, Nadiia Chepurko, et al.
Abstract
In this paper, we are interested in modeling a how-to instructional procedure, such as a cooking recipe, with a meaningful and rich high-level representation. Specifically, we propose to represent cooking recipes and food images as cooking programs. Programs provide a structured representation of the task, capturing cooking semantics and sequential relationships of actions in the form of a graph. This allows them to be easily manipulated by users and executed by agents. To this end, we build a model that is trained to learn a joint embedding between recipes and food images via self-supervision and jointly generate a program from this embedding as a sequence. To validate our idea, we crowdsource programs for cooking recipes and show that: (a) projecting the image-recipe embeddings into programs leads to better cross-modal retrieval results; (b) generating programs from images leads to better recognition results compared to predicting raw cooking instructions; and (c) we can generate foo
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Retrieval In The Cooking Context: Learning Semantic Text-image Embeddings (2018)0.00
- Recipe1m+: A Dataset For Learning Cross-modal Embeddings For Cooking Recipes And Food Images (2018)17.24
- Images & Recipes: Retrieval In The Cooking Context (2018)3.58
- Cross-modal Food Retrieval: Learning A Joint Embedding Of Food Images And Recipes With Semantic Consistency And Attention Mechanism (2020)12.10
- CHEF: Cross-modal Hierarchical Embeddings For Food Domain Retrieval (2021)8.35
- Towards Unbiased Cross-modal Representation Learning For Food Image-to-recipe Retrieval (2025)0.00
- SIMMER: Cross-modal Food Image--recipe Retrieval Via Mllm-based Embedding (2026)0.00
- Mitigating Cross-modal Representation Bias For Multicultural Image-to-recipe Retrieval (2025)0.00