Reinforcement Learning With Wasserstein Distance Regularisation, With Applications To Multipolicy Learning
2018 Β· Mohammed Amin Abdullah, Aldo Pacchiano, Moez Draief
Abstract
We describe an application of Wasserstein distance to Reinforcement Learning. The Wasserstein distance in question is between the distribution of mappings of trajectories of a policy into some metric space, and some other fixed distribution (which may, for example, come from another policy). Different policies induce different distributions, so given an underlying metric, the Wasserstein distance quantifies how different policies are. This can be used to learn multiple polices which are different in terms of such Wasserstein distances by using a Wasserstein regulariser. Changing the sign of the regularisation parameter, one can learn a policy for which its trajectory mapping distribution is attracted to a given fixed distribution.
Authors
(none)
Tags
Stats
Related papers
- Learning To Score Behaviors For Guided Policy Optimization (2019)0.00
- Distributional Robustness And Regularization In Reinforcement Learning (2020)0.00
- Distributional Reinforcement Learning With Regularized Wasserstein Loss (2022)0.00
- Federated Distributional Reinforcement Learning With Distributional Critic Regularization (2026)0.00
- Wasserstein Distance Maximizing Intrinsic Control (2021)0.00
- Distill Knowledge In Multi-task Reinforcement Learning With Optimal-transport Regularization (2023)0.00
- Efficient Wasserstein Natural Gradients For Reinforcement Learning (2020)0.00
- Regularization Matters In Policy Optimization (2019)2.68