Smart Multi-modal Search: Contextual Sparse And Dense Embedding Integration In Adobe Express
2024 Β· Cherag Aroraa, Tracy Holloway King, Jayant Kumar, et al.
Abstract
As user content and queries become increasingly multi-modal, the need for effective multi-modal search systems has grown. Traditional search systems often rely on textual and metadata annotations for indexed images, while multi-modal embeddings like CLIP enable direct search using text and image embeddings. However, embedding-based approaches face challenges in integrating contextual features such as user locale and recency. Building a scalable multi-modal search system requires fine-tuning several components. This paper presents a multi-modal search architecture and a series of AB tests that optimize embeddings and multi-modal technologies in Adobe Express template search. We address considerations such as embedding model selection, the roles of embeddings in matching and ranking, and the balance between dense and sparse embeddings. Our iterative approach demonstrates how utilizing sparse, dense, and contextual features enhances short and long query search, significantly reduces null
Authors
(none)
Tags
Stats
Related papers
- Metaembed: Scaling Multimodal Retrieval At Test-time With Flexible Late Interaction (2025)2.35
- Rzenembed: Towards Comprehensive Multimodal Retrieval (2025)0.00
- Embedding-based Retrieval In Multimodal Content Moderation (2025)2.26
- Clamr: Contextualized Late-interaction For Multimodal Content Retrieval (2025)0.00
- Multimodal Contextualized Support For Enhancing Video Retrieval System (2026)0.00
- Deepimagesearch: Benchmarking Multimodal Agents For Context-aware Image Retrieval In Visual Histories (2026)0.00
- Compressible And Searchable: Ai-native Multi-modal Retrieval System With Learned Image Compression (2024)0.00
- MRSE: An Efficient Multi-modality Retrieval System For Large Scale E-commerce (2024)0.00