Fast Stochastic Policy Gradient: Negative Momentum For Reinforcement Learning
2024 Β· Haobin Zhang, Zhuang Yang
Abstract
Stochastic optimization algorithms, particularly stochastic policy gradient (SPG), report significant success in reinforcement learning (RL). Nevertheless, up to now, that how to speedily acquire an optimal solution for RL is still a challenge. To tackle this issue, this work develops a fast SPG algorithm from the perspective of utilizing a momentum, coined SPG-NM. Specifically, in SPG-NM, a novel type of the negative momentum (NM) technique is applied into the classical SPG algorithm. Different from the existing NM techniques, we have adopted a few hyper-parameters in our SPG-NM algorithm. Moreover, the computational complexity is nearly same as the modern SPG-type algorithms, e.g., accelerated policy gradient (APG), which equips SPG with Nesterov's accelerated gradient (NAG). We evaluate the resulting algorithm on two classical tasks, bandit setting and Markov decision process (MDP). Numerical results in different tasks demonstrate faster convergence rate of the resulting algorithm b
Authors
(none)
Tags
Stats
Related papers
- Optimistic Natural Policy Gradient: A Simple Efficient Policy Optimization Framework For Online RL (2023)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Global Convergence Of Natural Policy Gradient With Hessian-aided Momentum Variance Reduction (2024)0.00
- MDPGT: Momentum-based Decentralized Policy Gradient Tracking (2021)0.00
- Stochastic Recursive Momentum For Policy Gradient Methods (2020)0.00
- Where Did My Optimum Go?: An Empirical Analysis Of Gradient Descent Optimization In Policy Gradient Methods (2018)0.00
- Bregman Gradient Policy Optimization (2021)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00