Self-distilled Dynamic Fusion Network For Language-based Fashion Retrieval
2024 Β· Yiming Wu, Hangfei Li, Fangfang Wang, et al.
Abstract
In the domain of language-based fashion image retrieval, pinpointing the desired fashion item using both a reference image and its accompanying textual description is an intriguing challenge. Existing approaches lean heavily on static fusion techniques, intertwining image and text. Despite their commendable advancements, these approaches are still limited by a deficiency in flexibility. In response, we propose a Self-distilled Dynamic Fusion Network to compose the multi-granularity features dynamically by considering the consistency of routing path and modality-specific information simultaneously. Two new modules are included in our proposed method: (1) Dynamic Fusion Network with Modality Specific Routers. The dynamic network enables a flexible determination of the routing for each reference image and modification text, taking into account their distinct semantics and distributions. (2) Self Path Distillation Loss. A stable path decision for queries benefits the optimization of featur
Authors
(none)
Tags
Stats
Related papers
- Mmfl-net: Multi-scale And Multi-granularity Feature Learning For Cross-domain Fashion Retrieval (2022)5.84
- Unifashion: A Unified Vision-language Model For Multimodal Fashion Retrieval And Generation (2024)10.66
- Modality-agnostic Attention Fusion For Visual Search With Text Feedback (2020)0.00
- Fad-vlp: Fashion Vision-and-language Pre-training Towards Unified Retrieval And Captioning (2022)7.81
- Training And Challenging Models For Text-guided Fashion Image Retrieval (2022)0.00
- DAFM: Dynamic Adaptive Fusion For Multi-model Collaboration In Composed Image Retrieval (2025)0.00
- A Hybrid Multimodal Deep Learning Framework For Intelligent Fashion Recommendation (2025)0.00
- Fashionbert: Text And Image Matching With Adaptive Loss For Cross-modal Retrieval (2020)15.16