Learning TFIDF Enhanced Joint Embedding For Recipe-image Cross-modal Retrieval Service
2021 Β· Zhongwei Xie, Ling Liu, Yanzhao Wu, et al.
Abstract
It is widely acknowledged that learning joint embeddings of recipes with images is challenging due to the diverse composition and deformation of ingredients in cooking procedures. We present a Multi-modal Semantics enhanced Joint Embedding approach (MSJE) for learning a common feature space between the two modalities (text and image), with the ultimate goal of providing high-performance cross-modal retrieval services. Our MSJE approach has three unique features. First, we extract the TFIDF feature from the title, ingredients and cooking instructions of recipes. By determining the significance of word sequences through combining LSTM learned features with their TFIDF features, we encode a recipe into a TFIDF weighted vector for capturing significant key terms and how such key terms are used in the corresponding cooking instructions. Second, we combine the recipe TFIDF feature with the recipe sequence feature extracted through two-stage LSTM networks, which is effective in capturing the
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Food Retrieval: Learning A Joint Embedding Of Food Images And Recipes With Semantic Consistency And Attention Mechanism (2020)12.10
- SIMMER: Cross-modal Food Image--recipe Retrieval Via Mllm-based Embedding (2026)0.00
- Cross-modal Retrieval In The Cooking Context: Learning Semantic Text-image Embeddings (2018)0.00
- CHEF: Cross-modal Hierarchical Embeddings For Food Domain Retrieval (2021)8.35
- Transformer Decoders With Multimodal Regularization For Cross-modal Food Retrieval (2022)14.17
- Recipe1m+: A Dataset For Learning Cross-modal Embeddings For Cooking Recipes And Food Images (2018)17.24
- Transformer-based Cross-modal Recipe Embeddings With Large Batch Training (2022)5.84
- Revamping Cross-modal Recipe Retrieval With Hierarchical Transformers And Self-supervised Learning (2021)13.97