Retrieval-grpo: A Multi-objective Reinforcement Learning Framework For Dense Retrieval In Taobao Search
2025 Β· Xingxian Liu, Dongshuai Li, Jiahui Wan, et al.
Abstract
Dense retrieval, as the core component of e-commerce search engines, maps user queries and items into a unified semantic space through pre-trained embedding models to enable large-scale real-time semantic retrieval. Despite the rapid advancement of LLMs gradually replacing traditional BERT architectures for embedding, their training paradigms still adhere to BERT-like supervised fine-tuning and hard negative mining strategies. This approach relies on complex offline hard negative sample construction pipelines, which constrain model iteration efficiency and hinder the evolutionary potential of semantic representation capabilities. Besides, existing multi-task learning frameworks face the seesaw effect when simultaneously optimizing semantic relevance and non-relevance objectives. In this paper, we propose Retrieval-GRPO, a multi-objective reinforcement learning-based dense retrieval framework designed to address these challenges. The method eliminates offline hard negative sample constr
Authors
(none)
Tags
Stats
Related papers
- Multi-objective Personalized Product Retrieval In Taobao Search (2022)0.00
- Embedding-based Product Retrieval In Taobao Search (2021)13.70
- Graph Contrastive Learning With Multi-objective For Personalized Product Retrieval In Taobao Search (2023)0.00
- Large Reasoning Embedding Models: Towards Next-generation Dense Retrieval Paradigm (2025)0.00
- Mine And Refine: Optimizing Graded Relevance In E-commerce Search Retrieval (2026)0.00
- MRSE: An Efficient Multi-modality Retrieval System For Large Scale E-commerce (2024)0.00
- Delving Into E-commerce Product Retrieval With Vision-language Pre-training (2023)6.77
- GRIT: Graph-based Recall Improvement For Task-oriented E-commerce Queries (2025)0.00