Onevision: An End-to-end Generative Framework For Multi-view E-commerce Vision Search
2025 Β· Zexin Zheng, Huangyu Dai, Lingtao Mao, et al.
Abstract
Traditional vision search, similar to search and recommendation systems, follows the multi-stage cascading architecture (MCA) paradigm to balance efficiency and conversion. Specifically, the query image undergoes feature extraction, recall, pre-ranking, and ranking stages, ultimately presenting the user with semantically similar products that meet their preferences. This multi-view representation discrepancy of the same object in the query and the optimization objective collide across these stages, making it difficult to achieve Pareto optimality in both user experience and conversion. In this paper, an end-to-end generative framework, OneVision, is proposed to address these problems. OneVision builds on VRQ, a vision-aligned residual quantization encoding, which can align the vastly different representations of an object across multiple viewpoints while preserving the distinctive features of each product as much as possible. Then a multi-stage semantic alignment scheme is adopted to m
Authors
(none)
Tags
Stats
Related papers
- Zero-shot Retrieval For Scalable Visual Search In A Two-sided Marketplace (2025)1.57
- V\(^2\)L: Leveraging Vision And Vision-language Models Into Large-scale Product Retrieval (2022)0.00
- Deep Learning Based Large Scale Visual Recommendation And Search For E-commerce (2017)0.00
- From Pixels To Purchase: Building And Evaluating A Taxonomy-decoupled Visual Search Engine For Home Goods E-commerce (2026)0.00
- Factorized Transport Alignment For Multimodal And Multiview E-commerce Representation Learning (2025)0.00
- Unicvr: From Alignment To Reranking For Unified Zero-shot Composed Visual Retrieval (2026)0.00
- Retrieval-guided Cross-view Image Synthesis (2024)0.00
- Fashionmv: Product-level Composed Image Retrieval With Multi-view Fashion Data (2026)2.98