Deep Primal-dual Reinforcement Learning: Accelerating Actor-critic Using Bellman Duality
2017 Β· Woon Sang Cho, Mengdi Wang
Abstract
We develop a parameterized Primal-Dual \(\pi\) Learning method based on deep neural networks for Markov decision process with large state space and off-policy reinforcement learning. In contrast to the popular Q-learning and actor-critic methods that are based on successive approximations to the nonlinear Bellman equation, our method makes primal-dual updates to the policy and value functions utilizing the fundamental linear Bellman duality. Naive parametrization of the primal-dual \(\pi\) learning method using deep neural networks would encounter two major challenges: (1) each update requires computing a probability distribution over the state space and is intractable; (2) the iterates are unstable since the parameterized Lagrangian function is no longer linear. We address these challenges by proposing a relaxed Lagrangian formulation with a regularization penalty using the advantage function. We show that the dual policy update step in our method is equivalent to the policy gradient
Authors
(none)
Tags
Stats
Related papers
- Primal-dual \(\pi\) Learning: Sample Complexity And Sublinear Run Time For Ergodic Markov Decision Problems (2017)0.00
- Efficient Performance Bounds For Primal-dual Reinforcement Learning From Demonstrations (2021)0.00
- A Two-timescale Primal-dual Framework For Reinforcement Learning Via Online Dual Variable Guidance (2025)0.00
- Deep Exploration With Pac-bayes (2024)0.00
- Broad Critic Deep Actor Reinforcement Learning For Continuous Control (2024)0.00
- Potential Field Guided Actor-critic Reinforcement Learning (2020)0.00
- Parameter Sharing Deep Deterministic Policy Gradient For Cooperative Multi-agent Reinforcement Learning (2017)0.00
- Actor-critic Reinforcement Learning With Phased Actor (2024)0.00