Beyond Exponentially Fast Mixing In Average-reward Reinforcement Learning Via Multi-level Monte Carlo Actor-critic
2023 Β· Wesley A. Suttle, Amrit Singh Bedi, Bhrij Patel, et al.
Abstract
Many existing reinforcement learning (RL) methods employ stochastic gradient iteration on the back end, whose stability hinges upon a hypothesis that the data-generating process mixes exponentially fast with a rate parameter that appears in the step-size selection. Unfortunately, this assumption is violated for large state spaces or settings with sparse rewards, and the mixing time is unknown, making the step size inoperable. In this work, we propose an RL methodology attuned to the mixing time by employing a multi-level Monte Carlo estimator for the critic, the actor, and the average reward embedded within an actor-critic (AC) algorithm. This method, which we call \textbf\{M\}ulti-level \textbf\{A\}ctor-\textbf\{C\}ritic (MAC), is developed especially for infinite-horizon average-reward settings and neither relies on oracle knowledge of the mixing time in its parameter selection nor assumes its exponential decay; it, therefore, is readily applicable to applications with slower mixing
Authors
(none)
Tags
Stats
Related papers
- Towards Global Optimality For Practical Average Reward Reinforcement Learning Without Mixing Time Oracles (2024)0.00
- A Sharper Global Convergence Analysis For Average Reward Reinforcement Learning Via An Actor-critic Approach (2024)0.00
- Monte Carlo Augmented Actor-critic For Sparse Reward Deep Reinforcement Learning From Suboptimal Demonstrations (2022)0.00
- Multi-agent Reinforcement Learning Accelerated MCMC On Multiscale Inversion Problem (2020)0.00
- Natural Policy Gradient For Average Reward Non-stationary RL (2025)0.00
- Finite-time Convergence And Sample Complexity Of Actor-critic Multi-objective Reinforcement Learning (2024)0.00
- Mean Actor Critic (2017)0.00
- Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning With A Stochastic Actor (2018)0.00