Fashionbert: Text And Image Matching With Adaptive Loss For Cross-modal Retrieval
2020 Β· Dehong Gao, Linbo Jin, Ben Chen, et al.
Abstract
In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the "object-level" information in the fashion images, while fashion texts are prone to describe more detailed information, e.g. styles, attributes. RoIs are thus not fine-grained enough for fashion text and image matching. To this end, we propose FashionBERT, which leverages patches as image features. With the pre-trained BERT model as the backbone network, FashionBERT learns high level representations of texts and images. Meanwhile, we propose an adaptive loss to trade off multitask learning in the FashionBERT modeling. Two tasks (i.e., text and imag
Authors
(none)
Tags
Stats
Related papers
- From Region To Patch: Attribute-aware Foreground-background Contrastive Learning For Fine-grained Fashion Retrieval (2023)10.00
- Fad-vlp: Fashion Vision-and-language Pre-training Towards Unified Retrieval And Captioning (2022)7.81
- Unifashion: A Unified Vision-language Model For Multimodal Fashion Retrieval And Generation (2024)10.66
- Fashion Image Retrieval With Multi-granular Alignment (2023)0.00
- Training And Challenging Models For Text-guided Fashion Image Retrieval (2022)0.00
- A Strong Baseline For Fashion Retrieval With Person Re-identification Models (2020)8.09
- Fashion-rag: Multimodal Fashion Image Editing Via Retrieval-augmented Generation (2025)4.52
- ACE-BERT: Adversarial Cross-modal Enhanced BERT For E-commerce Retrieval (2021)0.00