Towards Unbiased Cross-modal Representation Learning For Food Image-to-recipe Retrieval
2025 Β· Qing Wang, Chong-Wah Ngo, Ee-Peng Lim
Abstract
This paper addresses the challenges of learning representations for recipes and food images in the cross-modal retrieval problem. As the relationship between a recipe and its cooked dish is cause-and-effect, treating a recipe as a text source describing the visual appearance of a dish for learning representation, as the existing approaches, will create bias misleading image-and-recipe similarity judgment. Specifically, a food image may not equally capture every detail in a recipe, due to factors such as the cooking process, dish presentation, and image-capturing conditions. The current representation learning tends to capture dominant visual-text alignment while overlooking subtle variations that determine retrieval relevance. In this paper, we model such bias in cross-modal representation learning using causal theory. The causal view of this problem suggests ingredients as one of the confounder sources and a simple backdoor adjustment can alleviate the bias. By causal intervention, we
Authors
(none)
Tags
Stats
Related papers
- Mitigating Cross-modal Representation Bias For Multicultural Image-to-recipe Retrieval (2025)0.00
- Cross-modal Retrieval In The Cooking Context: Learning Semantic Text-image Embeddings (2018)0.00
- Cross-modal Food Retrieval: Learning A Joint Embedding Of Food Images And Recipes With Semantic Consistency And Attention Mechanism (2020)12.10
- Cross-lingual Adaptation For Recipe Retrieval With Mixup (2022)5.84
- CHEF: Cross-modal Hierarchical Embeddings For Food Domain Retrieval (2021)8.35
- Cross-modal Retrieval And Synthesis (X-MRS): Closing The Modality Gap In Shared Representation Learning (2020)0.00
- Images & Recipes: Retrieval In The Cooking Context (2018)3.58
- Revamping Cross-modal Recipe Retrieval With Hierarchical Transformers And Self-supervised Learning (2021)13.97