A Sharper Global Convergence Analysis For Average Reward Reinforcement Learning Via An Actor-critic Approach
2024 Β· Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal
Abstract
This work examines average-reward reinforcement learning with general policy parametrization. Existing state-of-the-art (SOTA) guarantees for this problem are either suboptimal or hindered by several challenges, including poor scalability with respect to the size of the state-action space, high iteration complexity, and dependence on knowledge of mixing times and hitting times. To address these limitations, we propose a Multi-level Monte Carlo-based Natural Actor-Critic (MLMC-NAC) algorithm. Our work is the first to achieve a global convergence rate of \(\tilde\{\mathcal\{O\}\}(1/\sqrt\{T\})\) for average-reward Markov Decision Processes (MDPs) (where \(T\) is the horizon length), without requiring the knowledge of mixing and hitting times. Moreover, the convergence rate does not scale with the size of the state space, therefore even being applicable to infinite state spaces.
Authors
(none)
Tags
Stats
Related papers
- Towards Global Optimality For Practical Average Reward Reinforcement Learning Without Mixing Time Oracles (2024)0.00
- Beyond Exponentially Fast Mixing In Average-reward Reinforcement Learning Via Multi-level Monte Carlo Actor-critic (2023)0.00
- Natural Policy Gradient For Average Reward Non-stationary RL (2025)0.00
- Finite-time Convergence And Sample Complexity Of Actor-critic Multi-objective Reinforcement Learning (2024)0.00
- Single-timescale Actor-critic Provably Finds Globally Optimal Policy (2020)0.00
- Scalable Multi-agent Reinforcement Learning For Networked Systems With Average Reward (2020)0.00
- Non-asymptotic Convergence Analysis Of Two Time-scale (natural) Actor-critic Algorithms (2020)0.00
- Learning General Parameterized Policies For Infinite Horizon Average Reward Constrained Mdps Via Primal-dual Policy Gradient Algorithm (2024)0.00