How To Enable Uncertainty Estimation In Proximal Policy Optimization
2022 Β· Eugene Bykovets, Yannick Metz, Mennatallah El-Assady, et al.
Abstract
While deep reinforcement learning (RL) agents have showcased strong results across many domains, a major concern is their inherent opaqueness and the safety of such systems in real-world use cases. To overcome these issues, we need agents that can quantify their uncertainty and detect out-of-distribution (OOD) states. Existing uncertainty estimation techniques, like Monte-Carlo Dropout or Deep Ensembles, have not seen widespread adoption in on-policy deep RL. We posit that this is due to two reasons: concepts like uncertainty and OOD states are not well defined compared to supervised learning, especially for on-policy RL methods. Secondly, available implementations and comparative studies for uncertainty estimation methods in RL have been limited. To overcome the first gap, we propose definitions of uncertainty and OOD for Actor-Critic RL algorithms, namely, proximal policy optimization (PPO), and present possible applicable measures. In particular, we discuss the concepts of value and
Authors
(none)
Tags
Stats
Related papers
- Deep Model-based Reinforcement Learning Via Estimated Uncertainty And Conservative Policy Optimization (2019)0.00
- A Theoretical Analysis Of Optimistic Proximal Policy Optimization In Linear Markov Decision Processes (2023)0.00
- Truly Proximal Policy Optimization (2019)0.00
- Uncertainty-aware Policy Optimization: A Robust, Adaptive Trust Region Approach (2020)0.00
- Uncertainty Quantification And Exploration For Reinforcement Learning (2019)6.77
- Reinforcement Learning For Robotics And Control With Active Uncertainty Reduction (2019)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Proximal Policy Optimization Via Enhanced Exploration Efficiency (2020)13.70