Survival Of The Fittest: Evolutionary Adaptation Of Policies For Environmental Shifts
2024 Β· Sheryl Paul, Jyotirmoy V. Deshmukh
Abstract
Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the environment experiences drastic distribution shifts, the optimal policy obtained in the trained environment may be sub-optimal or may entirely fail in helping find goal-reaching paths for the agent. Approaches like domain randomization and robust RL can provide robust policies, but typically assume minor (bounded) distribution shifts. For substantial distribution shifts, retraining (either with a warm-start policy or from scratch) is an alternative approach. In this paper, we develop a novel approach called \{\em Evolutionary Robust Policy Optimization\} (ERPO), an adaptive re-training algorithm inspired by evolutionary game theory (EGT). ERPO learns an optimal policy for the shifted environment iteratively using a temperature parameter that controls
Authors
(none)
Tags
Stats
Related papers
- Evolution-guided Policy Gradient In Reinforcement Learning (2018)0.00
- State Regularized Policy Optimization On Data With Dynamics Shift (2023)0.00
- Survival Dynamics Of Neural And Programmatic Policies In Evolutionary Reinforcement Learning (2026)0.00
- Robust Adversarial Policy Optimization Under Dynamics Uncertainty (2026)0.00
- Learning A Subspace Of Policies For Online Adaptation In Reinforcement Learning (2021)0.00
- Reinforcement Learning With Non-ergodic Reward Increments: Robustness Via Ergodicity Transformations (2023)0.00
- Erl-re\(^2\): Efficient Evolutionary Reinforcement Learning With Shared State Representation And Individual Policy Representation (2022)0.00
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00