Stochastic Policy Gradient Methods: Improved Sample Complexity For Fisher-non-degenerate Policies

·2023

arXiv:fatkhullin2023stochastic ↗Google Scholar ↗Semantic Scholar ↗

Abstract

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the understanding of their convergence to a globally optimal policy is still limited. In this work, we develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies which allows to address the case of continuous state action spaces. First, we propose a Normalized Policy Gradient method with Implicit Gradient Transport (N-PG-IGT) and derive a \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-2.5\})\) sample complexity of this method for finding a global \(\epsilon\)-optimal policy. Improving over the previously known \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-3\})\) complexity, this algorithm does not require the use of importance sampling or second-order information and samples only one trajectory per iteration. Second, we

Abstract

Related papers