Commercemm: Large-scale Commerce Multimodal Representation Learning With Omni Retrieval
2022 Β· Licheng Yu, Jun Chen, Animesh Sinha, et al.
Abstract
We introduce CommerceMM - a multimodal model capable of providing a diverse and granular understanding of commerce topics associated to the given piece of content (image, text, image+text), and having the capability to generalize to a wide range of tasks, including Multimodal Categorization, Image-Text Retrieval, Query-to-Product Retrieval, Image-to-Product Retrieval, etc. We follow the pre-training + fine-tuning training regime and present 5 effective pre-training tasks on image-text pairs. To embrace more common and diverse commerce data with text-to-multimodal, image-to-multimodal, and multimodal-to-multimodal mapping, we propose another 9 novel cross-modal and cross-pair retrieval tasks, called Omni-Retrieval pre-training. The pre-training is conducted in an efficient manner with only two forward/backward updates for the combined 14 tasks. Extensive experiments and analysis show the effectiveness of each task. When combining all pre-training tasks, our model achieves state-of-the-a
Authors
(none)
Tags
Stats
Related papers
- Asr-enhanced Multimodal Representation Learning For Cross-domain Product Retrieval (2024)0.00
- Category-oriented Representation Learning For Image To Multi-modal Retrieval (2023)0.00
- Product1m: Towards Weakly Supervised Instance-level Product Retrieval Via Cross-modal Pretraining (2021)12.61
- AFMRL: Attribute-enhanced Fine-grained Multi-modal Representation Learning In E-commerce (2026)0.00
- Multimodal Semantic Retrieval For Product Search (2025)3.58
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00
- Entity-graph Enhanced Cross-modal Pretraining For Instance-level Product Retrieval (2022)5.24
- Beyond Global Similarity: Towards Fine-grained, Multi-condition Multimodal Retrieval (2026)2.20