Efficient Wasserstein Natural Gradients For Reinforcement Learning
2020 Β· Ted Moskovitz, Michael Arbel, Ferenc Huszar, et al.
Abstract
A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including a divergence penalty in the objective to establish a trust region. Experiments on challenging tasks demonstrate improvements in both computational cost and performance over advanced baselines.
Authors
(none)
Tags
Stats
Related papers
- Natural Policy Gradients In Reinforcement Learning Explained (2022)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Optimistic Natural Policy Gradient: A Simple Efficient Policy Optimization Framework For Online RL (2023)0.00
- Learning To Score Behaviors For Guided Policy Optimization (2019)0.00
- Variational Policy Gradient Method For Reinforcement Learning With General Utilities (2020)0.00
- Policy Gradient For Reinforcement Learning With General Utilities (2022)0.00
- Policy Gradient Using Weak Derivatives For Reinforcement Learning (2020)0.00
- A Nearly Blackwell-optimal Policy Gradient Method (2021)0.00