Fashion Focus: Multi-modal Retrieval System For Video Commodity Localization In E-commerce
2021 Β· Yanhao Zhang, Qiang Wang, Pan Pan, et al.
Abstract
Nowadays, live-stream and short video shopping in E-commerce have grown exponentially. However, the sellers are required to manually match images of the selling products to the timestamp of exhibition in the untrimmed video, resulting in a complicated process. To solve the problem, we present an innovative demonstration of multi-modal retrieval system called "Fashion Focus", which enables to exactly localize the product images in the online video as the focuses. Different modality contributes to the community localization, including visual content, linguistic features and interaction context are jointly investigated via presented multi-modal learning. Our system employs two procedures for analysis, including video content structuring and multi-modal retrieval, to automatically achieve accurate video-to-shop matching. Fashion Focus presents a unified framework that can orientate the consumers towards relevant product exhibitions during watching videos and help the sellers to effectively
Authors
(none)
Tags
Stats
Related papers
- Multi-queue Momentum Contrast For Microvideo-product Retrieval (2022)11.50
- Fashionmv: Product-level Composed Image Retrieval With Multi-view Fashion Data (2026)2.98
- Multimodal Contextualized Support For Enhancing Video Retrieval System (2026)0.00
- Neural Graph Matching For Video Retrieval In Large-scale Video-driven E-commerce (2024)0.00
- Factorized Transport Alignment For Multimodal And Multiview E-commerce Representation Learning (2025)0.00
- Zero-shot Retrieval For Scalable Visual Search In A Two-sided Marketplace (2025)1.57
- MAKE: Vision-language Pre-training Based Product Retrieval In Taobao Search (2023)7.81
- Multimodal Semantic Retrieval For Product Search (2025)3.58