Offline Reinforcement Learning With Fisher Divergence Critic Regularization
2021 Β· Ilya Kostrikov, Jonathan Tompson, Rob Fergus, et al.
Abstract
Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, which generated the offline data, plus a state-action value offset term, which can be learned using a neural network. Behavior regularization then corresponds to an appropriate regularizer on the offset term. We propose using a gradient penalty regularizer for the offset term and demonstrate its equivalence to Fisher divergence regularization, suggesting connections to the score matching and generative energy-based model literature. We thus term our resulting algorithm Fisher-BRC (Behavior Regularized Critic). On standard offline RL benchmarks, Fisher-BRC achieves both improved performance and fas
Authors
(none)
Tags
Stats
Related papers
- BRAC+: Improved Behavior Regularized Actor Critic For Offline Reinforcement Learning (2021)0.00
- Iteratively Refined Behavior Regularization For Offline Reinforcement Learning (2023)2.26
- FOCAL: Efficient Fully-offline Meta-reinforcement Learning Via Distance Metric Learning And Behavior Regularization (2020)0.00
- A Behavior Regularized Implicit Policy For Offline Reinforcement Learning (2022)0.00
- Diffusion Actor-critic: Formulating Constrained Policy Iteration As Diffusion Noise Regression For Offline Reinforcement Learning (2024)2.92
- Offline Policy Optimization In RL With Variance Regularizaton (2022)0.00
- Robust Offline Reinforcement Learning With Gradient Penalty And Constraint Relaxation (2022)0.00
- Federated Offline Policy Optimization With Dual Regularization (2024)3.58