Fashionntm: Multi-turn Fashion Image Retrieval Via Cascaded Memory
2023 Β· Anwesan Pal, Sahil Wadhwa, Ayush Jaiswal, et al.
Abstract
Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvemen
Authors
(none)
Tags
Stats
Related papers
- Conversational Fashion Image Retrieval Via Multiturn Natural Language Feedback (2021)11.85
- Mmfl-net: Multi-scale And Multi-granularity Feature Learning For Cross-domain Fashion Retrieval (2022)5.84
- Training And Challenging Models For Text-guided Fashion Image Retrieval (2022)0.00
- Fashionmv: Product-level Composed Image Retrieval With Multi-view Fashion Data (2026)2.98
- Unifashion: A Unified Vision-language Model For Multimodal Fashion Retrieval And Generation (2024)10.66
- Fad-vlp: Fashion Vision-and-language Pre-training Towards Unified Retrieval And Captioning (2022)7.81
- Fashion-rag: Multimodal Fashion Image Editing Via Retrieval-augmented Generation (2025)4.52
- Fashionbert: Text And Image Matching With Adaptive Loss For Cross-modal Retrieval (2020)15.16