Dividing And Conquering Cross-modal Recipe Retrieval: From Nearest Neighbours Baselines To Sota
2019 Β· Mikhail Fain, Niall Twomey, Andrey Ponikar, et al.
Abstract
We propose a novel non-parametric method for cross-modal recipe retrieval which is applied on top of precomputed image and text embeddings. By combining our method with standard approaches for building image and text encoders, trained independently with a self-supervised classification objective, we create a baseline model which outperforms most existing methods on a challenging image-to-recipe task. We also use our method for comparing image and text encoders trained using different modern approaches, thus addressing the issues hindering the development of novel methods for cross-modal recipe retrieval. We demonstrate how to use the insights from model comparison and extend our baseline model with standard triplet loss that improves state-of-the-art on the Recipe1M dataset by a large margin, while using only precomputed features and with much less complexity than existing methods. Further, our approach readily generalizes beyond recipe retrieval to other challenging domains, achieving
Authors
(none)
Tags
Stats
Related papers
- Revamping Cross-modal Recipe Retrieval With Hierarchical Transformers And Self-supervised Learning (2021)13.97
- Cross-modal Retrieval In The Cooking Context: Learning Semantic Text-image Embeddings (2018)0.00
- Cross-modal Food Retrieval: Learning A Joint Embedding Of Food Images And Recipes With Semantic Consistency And Attention Mechanism (2020)12.10
- Transformer Decoders With Multimodal Regularization For Cross-modal Food Retrieval (2022)14.17
- Recipe1m+: A Dataset For Learning Cross-modal Embeddings For Cooking Recipes And Food Images (2018)17.24
- CHEF: Cross-modal Hierarchical Embeddings For Food Domain Retrieval (2021)8.35
- Transformer-based Cross-modal Recipe Embeddings With Large Batch Training (2022)5.84
- SIMMER: Cross-modal Food Image--recipe Retrieval Via Mllm-based Embedding (2026)0.00