Fashion-rag: Multimodal Fashion Image Editing Via Retrieval-augmented Generation
2025 Β· Fulvio Sanguigni, Davide Morelli, Marcella Cornia, et al.
Abstract
In recent years, the fashion industry has increasingly adopted AI technologies to enhance customer experience, driven by the proliferation of e-commerce platforms and virtual applications. Among the various tasks, virtual try-on and multimodal fashion image editing -- which utilizes diverse input modalities such as text, garment sketches, and body poses -- have become a key area of research. Diffusion models have emerged as a leading approach for such generative tasks, offering superior image quality and diversity. However, most existing virtual try-on methods rely on having a specific garment input, which is often impractical in real-world scenarios where users may only provide textual specifications. To address this limitation, in this work we introduce Fashion Retrieval-Augmented Generation (Fashion-RAG), a novel method that enables the customization of fashion items based on user preferences provided in textual form. Our approach retrieves multiple garments that match the input spe
Authors
(none)
Tags
Stats
Related papers
- Unifashion: A Unified Vision-language Model For Multimodal Fashion Retrieval And Generation (2024)10.66
- Performance-efficiency Trade-off For Fashion Image Retrieval (2025)0.00
- Fad-vlp: Fashion Vision-and-language Pre-training Towards Unified Retrieval And Captioning (2022)7.81
- AR-RAG: Autoregressive Retrieval Augmentation For Image Generation (2025)0.00
- Imagerag: Dynamic Image Retrieval For Reference-guided Image Generation (2025)0.00
- Fashionbert: Text And Image Matching With Adaptive Loss For Cross-modal Retrieval (2020)15.16
- Cross-modal RAG: Sub-dimensional Text-to-image Retrieval-augmented Generation (2025)0.00
- Training And Challenging Models For Text-guided Fashion Image Retrieval (2022)0.00