Neural Network Compatible Off-policy Natural Actor-critic Algorithm
2021 Β· Raghuram Bharadwaj Diddigi, Prateek Jain, Prabuchandran K. J., et al.
Abstract
Learning optimal behavior from existing data is one of the most important problems in Reinforcement Learning (RL). This is known as "off-policy control" in RL where an agent's objective is to compute an optimal policy based on the data obtained from the given policy (known as the behavior policy). As the optimal policy can be very different from the behavior policy, learning optimal behavior is very hard in the "off-policy" setting compared to the "on-policy" setting where new data from the policy updates will be utilized in learning. This work proposes an off-policy natural actor-critic algorithm that utilizes state-action distribution correction for handling the off-policy behavior and the natural policy gradient for sample efficiency. The existing natural gradient-based actor-critic algorithms with convergence guarantees require fixed features for approximating both policy and value functions. This often leads to sub-optimal learning in many RL applications. On the other hand, our p
Authors
(none)
Tags
Stats
Related papers
- Mitigating Off-policy Bias In Actor-critic Methods With One-step Q-learning: A Novel Correction Approach (2022)0.00
- A Multi-agent Off-policy Actor-critic Algorithm For Distributed Reinforcement Learning (2019)11.39
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Revisiting Stochastic Off-policy Action-value Gradients (2017)0.00
- Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning (2019)0.00
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Behavior-guided Actor-critic: Improving Exploration Via Learning Policy Behavior Representation For Deep Reinforcement Learning (2021)0.00
- Offline-boosted Actor-critic: Adaptively Blending Optimal Historical Behaviors In Deep Off-policy RL (2024)0.00