Conversational Fashion Image Retrieval Via Multiturn Natural Language Feedback
2021 Β· Yifei Yuan, Wai Lam
Abstract
We study the task of conversational fashion image retrieval via multiturn natural language feedback. Most previous studies are based on single-turn settings. Existing models on multiturn conversational fashion image retrieval have limitations, such as employing traditional models, and leading to ineffective performance. We propose a novel framework that can effectively handle conversational fashion image retrieval with multiturn natural language feedback texts. One characteristic of the framework is that it searches for candidate images based on exploitation of the encoded reference image and feedback text information together with the conversation history. Furthermore, the image fashion attribute information is leveraged via a mutual attention strategy. Since there is no existing fashion dataset suitable for the multiturn setting of our task, we derive a large-scale multiturn fashion dataset via additional manual annotation efforts on an existing single-turn dataset. The experiments s
Authors
(none)
Tags
Stats
Related papers
- Fashionntm: Multi-turn Fashion Image Retrieval Via Cascaded Memory (2023)5.24
- Fashion IQ: A New Dataset Towards Retrieving Images By Natural Language Feedback (2019)17.43
- Training And Challenging Models For Text-guided Fashion Image Retrieval (2022)0.00
- Fad-vlp: Fashion Vision-and-language Pre-training Towards Unified Retrieval And Captioning (2022)7.81
- Unifashion: A Unified Vision-language Model For Multimodal Fashion Retrieval And Generation (2024)10.66
- Fashionmv: Product-level Composed Image Retrieval With Multi-view Fashion Data (2026)2.98
- Modality-agnostic Attention Fusion For Visual Search With Text Feedback (2020)0.00
- Image Retrieval With Mixed Initiative And Multimodal Feedback (2018)8.09