Policy Optimization In A Noisy Neighborhood: On Return Landscapes In Continuous Control
2023 Β· Nate Rahn, Pierluca D'Oro, Harley Wiltzer, et al.
Abstract
Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, eva
Authors
(none)
Tags
Stats
Related papers
- Moments Matter:stabilizing Policy Optimization Using Return Distributions (2026)0.00
- Distributional Policy Optimization: An Alternative Approach For Continuous Control (2019)0.00
- Policy Optimization For Continuous Reinforcement Learning (2023)2.26
- Unified Policy Optimization For Continuous-action Reinforcement Learning In Non-stationary Tasks And Games (2022)2.26
- On The Sample Complexity And Metastability Of Heavy-tailed Policy Search In Continuous Control (2021)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Policy Search By Target Distribution Learning For Continuous Control (2019)3.58
- Regularization Matters In Policy Optimization (2019)2.68