Offline Policy Optimization In RL With Variance Regularizaton
2022 Β· Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, et al.
Abstract
Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains the benefits of our approach across a range of continuous control domains when compared to existing sta
Authors
(none)
Tags
Stats
Related papers
- Near-optimal Offline Reinforcement Learning Via Double Variance Reduction (2021)0.00
- Regularizing A Model-based Policy Stationary Distribution To Stabilize Offline Reinforcement Learning (2022)0.00
- Optidice: Offline Policy Optimization Via Stationary Distribution Correction Estimation (2021)0.00
- Federated Offline Policy Optimization With Dual Regularization (2024)3.58
- Adaptive Advantage-guided Policy Regularization For Offline Reinforcement Learning (2024)3.09
- Iteratively Refined Behavior Regularization For Offline Reinforcement Learning (2023)2.26
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00
- Offline RL With No OOD Actions: In-sample Learning Via Implicit Value Regularization (2023)0.00