Sustainable Online Reinforcement Learning For Auto-bidding
2022 Β· Zhiyu Mou, Yusen Huo, Rongquan Bai, et al.
Abstract
Recently, auto-bidding technique has become an essential tool to increase the revenue of advertisers. Facing the complex and ever-changing bidding environments in the real-world advertising system (RAS), state-of-the-art auto-bidding policies usually leverage reinforcement learning (RL) algorithms to generate real-time bids on behalf of the advertisers. Due to safety concerns, it was believed that the RL training process can only be carried out in an offline virtual advertising system (VAS) that is built based on the historical data generated in the RAS. In this paper, we argue that there exists significant gaps between the VAS and RAS, making the RL training process suffer from the problem of inconsistency between online and offline (IBOO). Firstly, we formally define the IBOO and systematically analyze its causes and influences. Then, to avoid the IBOO, we propose a sustainable online RL (SORL) framework that trains the auto-bidding policy by directly interacting with the RAS, instea
Authors
(none)
Tags
Stats
Related papers
- Accelerating Offline Reinforcement Learning Application In Real-time Bidding And Recommendation: Potential Use Of Simulation (2021)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00
- Beyond OOD State Actions: Supported Cross-domain Offline Reinforcement Learning (2023)0.00
- Towards Robust Offline-to-online Reinforcement Learning Via Uncertainty And Smoothness (2023)5.24
- Towards Data-driven Offline Simulations For Online Reinforcement Learning (2022)0.00
- Towards Fast Safe Online Reinforcement Learning Via Policy Finetuning (2024)0.00
- Data Valuation For Offline Reinforcement Learning (2022)0.00