MAKE: Vision-language Pre-training Based Product Retrieval In Taobao Search
2023 Β· Xiaoyang Zheng, Zilong Wang, Ke Xu, et al.
Abstract
Taobao Search consists of two phases: the retrieval phase and the ranking phase. Given a user query, the retrieval phase returns a subset of candidate products for the following ranking phase. Recently, the paradigm of pre-training and fine-tuning has shown its potential in incorporating visual clues into retrieval tasks. In this paper, we focus on solving the problem of text-to-multimodal retrieval in Taobao Search. We consider that users' attention on titles or images varies on products. Hence, we propose a novel Modal Adaptation module for cross-modal fusion, which helps assigns appropriate weights on texts and images across products. Furthermore, in e-commerce search, user queries tend to be brief and thus lead to significant semantic imbalance between user queries and product titles. Therefore, we design a separate text encoder and a Keyword Enhancement mechanism to enrich the query representations and improve text-to-multimodal matching. To this end, we present a novel vision-lan
Authors
(none)
Tags
Stats
Related papers
- Delving Into E-commerce Product Retrieval With Vision-language Pre-training (2023)6.77
- V\(^2\)L: Leveraging Vision And Vision-language Models Into Large-scale Product Retrieval (2022)0.00
- Embedding-based Product Retrieval In Taobao Search (2021)13.70
- Multi-objective Personalized Product Retrieval In Taobao Search (2022)0.00
- Product1m: Towards Weakly Supervised Instance-level Product Retrieval Via Cross-modal Pretraining (2021)12.61
- Bringing Multimodality To Amazon Visual Search System (2024)6.34
- Unified Vision-language Representation Modeling For E-commerce Same-style Products Retrieval (2023)6.34
- Zero-shot Retrieval For Scalable Visual Search In A Two-sided Marketplace (2025)1.57