Enhancing Product Search Interfaces With Sketch-guided Diffusion And Language Agents
2025 Β· Edward Sun
Abstract
The rapid progress in diffusion models, transformers, and language agents has unlocked new possibilities, yet their potential in user interfaces and commercial applications remains underexplored. We present Sketch-Search Agent, a novel framework that transforms the image search experience by integrating a multimodal language agent with freehand sketches as control signals for diffusion models. Using the T2I-Adapter, Sketch-Search Agent combines sketches and text prompts to generate high-quality query images, encoded via a CLIP image encoder for efficient matching against an image corpus. Unlike existing methods, Sketch-Search Agent requires minimal setup, no additional training, and excels in sketch-based image retrieval and natural language interactions. The multimodal agent enhances user experience by dynamically retaining preferences, ranking results, and refining queries for personalized recommendations. This interactive design empowers users to create sketches and receive tailored
Authors
(none)
Tags
Stats
Related papers
- Livesketch: Query Perturbations For Guided Sketch-based Visual Search (2019)12.47
- Text-to-image Diffusion Models Are Great Sketch-photo Matchmakers (2024)9.41
- Image Retrieval With Mixed Initiative And Multimodal Feedback (2018)8.09
- Text-guided Synthesis Of Artistic Images With Retrieval-augmented Diffusion Models (2022)8.29
- A Sketch Is Worth A Thousand Words: Image Retrieval With Text And Sketch (2022)10.35
- Composite Sketch+text Queries For Retrieving Objects With Elusive Names And Complex Interactions (2025)5.84
- Sketch And Text Synergy: Fusing Structural Contours And Descriptive Attributes For Fine-grained Image Retrieval (2026)0.00
- Diff-sbsr: Learning Multimodal Feature-enhanced Diffusion Models For Zero-shot Sketch-based 3D Shape Retrieval (2026)0.00