RAD-DPO: Robust Adaptive Denoising Direct Preference Optimization For Generative Retrieval In E-commerce
2026 Β· Zhiguo Chen, Guohao Sun, Yiming Qiu, et al.
Abstract
arXiv:2602.23964v2 Announce Type: replace Abstract: Generative Retrieval (GR) is rapidly transforming e-commerce search by replacing traditional multi-stage pipelines with the autoregressive decoding of structured Semantic IDs (SIDs). Despite this architectural efficiency, aligning GR models with nuanced, real-world user preferences remains a critical challenge. While Direct Preference Optimization (DPO) offers an efficient alignment solution, its direct application to structured SIDs suffers from three limitations: (i) it penalizes shared hierarchical prefixes, causing gradient conflicts; (ii) it is vulnerable to noisy pseudo-negatives from implicit feedback; and (iii) in multi-label queries with multiple relevant items, it exacerbates a probability "squeezing effect" among valid candidates. To address these issues, we propose RAD-DPO, which introduces token-level gradient detachment to protect prefix structures, similarity-based dynamic reward weighting to mitigate label noise, and
Authors
(none)
Tags
Stats
Related papers
- Differentiable Geometric Indexing For End-to-end Generative Retrieval (2026)0.00
- Lightweight And Direct Document Relevance Optimization For Generative Information Retrieval (2025)4.52
- RADAR: Recall Augmentation Through Deferred Asynchronous Retrieval (2025)2.26
- Generative Retrieval Meets Multi-graded Relevance (2024)2.26
- Planning Ahead In Generative Retrieval: Guiding Autoregressive Generation Through Simultaneous Decoding (2024)8.82
- Retrieval-grpo: A Multi-objective Reinforcement Learning Framework For Dense Retrieval In Taobao Search (2025)0.00
- Breaking The Hourglass Phenomenon Of Residual Quantization: Enhancing The Upper Bound Of Generative Retrieval (2024)4.52
- Does Generative Retrieval Overcome The Limitations Of Dense Retrieval? (2025)0.00