Understanding The Effects Of Second-order Approximations In Natural Policy Gradient Reinforcement Learning
2022 Β· Brennan Gebotys, Alexander Wong, David A. Clausi
Abstract
Natural policy gradient methods are popular reinforcement learning methods that improve the stability of policy gradient methods by utilizing second-order approximations to precondition the gradient with the inverse of the Fisher-information matrix. However, to the best of the authors' knowledge, there has not been a study that has investigated the effects of different second-order approximations in a comprehensive and systematic manner. To address this, five different second-order approximations were studied and compared across multiple key metrics including performance, stability, sample efficiency, and computation time. Furthermore, hyperparameters which aren't typically acknowledged in the literature are studied including the effect of different batch sizes and optimizing the critic network with the natural gradient. Experimental results show that on average, improved second-order approximations achieve the best performance and that using properly tuned hyperparameters can lead to
Authors
(none)
Tags
Stats
Related papers
- Natural Policy Gradients In Reinforcement Learning Explained (2022)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- On The Second-order Convergence Of Biased Policy Gradient Algorithms (2023)0.00
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00
- Symmetric (optimistic) Natural Policy Gradient For Multi-agent Learning With Parameter Convergence (2022)0.00
- Linear Function Approximation As A Computationally Efficient Method To Solve Classical Reinforcement Learning Challenges (2024)0.00
- Global Convergence Of Natural Policy Gradient With Hessian-aided Momentum Variance Reduction (2024)0.00
- Reusing Historical Trajectories In Natural Policy Gradient Via Importance Sampling: Convergence And Convergence Rate (2024)2.26