Switching The Loss Reduces The Cost In Batch (offline) Reinforcement Learning
2024 Β· Alex Ayoub, Kaiwen Wang, Vincent Liu, et al.
Abstract
We propose training fitted Q-iteration with log-loss (FQI-log) for batch reinforcement learning (RL). We show that the number of samples needed to learn a near-optimal policy with FQI-log scales with the accumulated cost of the optimal policy, which is zero in problems where acting optimally achieves the goal and incurs no cost. In doing so, we provide a general framework for proving small-cost bounds, i.e. bounds that scale with the optimal achievable cost, in batch RL. Moreover, we empirically verify that FQI-log uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.
Authors
(none)
Tags
Stats
Related papers
- Regret-optimal Q-learning With Low Cost For Single-agent And Federated Reinforcement Learning (2025)0.00
- Cal-ql: Calibrated Offline RL Pre-training For Efficient Online Fine-tuning (2023)2.26
- Pessimistic Q-learning For Offline Reinforcement Learning: Towards Optimal Sample Complexity (2022)0.00
- Fast Rates For The Regret Of Offline Reinforcement Learning (2021)2.26
- Boosting Offline Reinforcement Learning With Residual Generative Modeling (2021)0.00
- Q* Approximation Schemes For Batch Reinforcement Learning: A Theoretical Comparison (2020)0.00
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- Quantile Q-learning: Revisiting Offline Extreme Q-learning With Quantile Regression (2025)0.00