Optimizing Product Deduplication In E-commerce With Multimodal Embeddings
2025 Β· Aysenur Kulunk, Berk Taskin, M. Furkan Eseoglu, et al.
Abstract
In large scale e-commerce marketplaces, duplicate product listings frequently cause consumer confusion and operational inefficiencies, degrading trust on the platform and increasing costs. Traditional keyword-based search methodologies falter in accurately identifying duplicates due to their reliance on exact textual matches, neglecting semantic similarities inherent in product titles. To address these challenges, we introduce a scalable, multimodal product deduplication designed specifically for the e-commerce domain. Our approach employs a domain-specific text model grounded in BERT architecture in conjunction with MaskedAutoEncoders for image representations. Both of these architectures are augmented with dimensionality reduction techniques to produce compact 128-dimensional embeddings without significant information loss. Complementing this, we also developed a novel decider model that leverages both text and image vectors. By integrating these feature extraction mechanisms with Mi
Authors
(none)
Tags
Stats
Related papers
- Transformer-empowered Multi-modal Item Embedding For Enhanced Image Search In E-commerce (2023)4.52
- Multimodal Semantic Retrieval For Product Search (2025)3.58
- Asr-enhanced Multimodal Representation Learning For Cross-domain Product Retrieval (2024)0.00
- ACE-BERT: Adversarial Cross-modal Enhanced BERT For E-commerce Retrieval (2021)0.00
- MRSE: An Efficient Multi-modality Retrieval System For Large Scale E-commerce (2024)0.00
- Factorized Transport Alignment For Multimodal And Multiview E-commerce Representation Learning (2025)0.00
- Turning Adversaries Into Allies: Reversing Typographic Attacks For Multimodal E-commerce Product Retrieval (2025)0.00
- Specializing Joint Representations For The Task Of Product Recommendation (2017)8.35