A Two-timescale Primal-dual Framework For Reinforcement Learning Via Online Dual Variable Guidance
2025 Β· Axel Friedrich Wolter, Tobias Sutter
Abstract
We study reinforcement learning by combining recent advances in regularized linear programming formulations with the classical theory of stochastic approximation. Motivated by the challenge of designing algorithms that leverage off-policy data while maintaining on-policy exploration, we propose PGDA-RL, a novel primal-dual Projected Gradient Descent-Ascent algorithm for solving regularized Markov Decision Processes (MDPs). PGDA-RL integrates experience replay-based gradient estimation with a two-timescale decomposition of the underlying nested optimization problem. The algorithm operates asynchronously, interacts with the environment through a single trajectory of correlated data, and updates its policy online in response to the dual variable associated with the occupancy measure of the underlying MDP. We prove that PGDA-RL converges almost surely to the optimal value function and policy of the regularized MDP. Our convergence analysis relies on tools from stochastic approximation theo
Authors
(none)
Tags
Stats
Related papers
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Provably Efficient Primal-dual Reinforcement Learning For Cmdps With Non-stationary Objectives And Constraints (2022)0.00
- A Primal-dual Algorithm For Offline Constrained Reinforcement Learning With Linear Mdps (2024)0.00
- Merging Deterministic Policy Gradient Estimations With Varied Bias-variance Tradeoff For Effective Deep Reinforcement Learning (2019)0.00
- Stochastic Primal-dual Methods And Sample Complexity Of Reinforcement Learning (2016)0.00
- Last-iterate Global Convergence Of Policy Gradients For Constrained Reinforcement Learning (2024)0.00
- Dual RL: Unification And New Methods For Reinforcement And Imitation Learning (2023)0.00
- Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control In Computationally Complex Environments (2019)0.00