Absolute Policy Optimization
2023 Β· Weiye Zhao, Feihan Li, Yifan Sun, et al.
Abstract
In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. Building upon this groundbreaking theoretical advancement, we further introduce a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO as well as its efficient variation Proximal Absolute Policy Optimization (PAPO) significantly outperforms
Authors
(none)
Tags
Stats
Related papers
- Simple Policy Optimization (2024)0.00
- Truly Proximal Policy Optimization (2019)0.00
- ANO: A Principled Approach To Robust Policy Optimization (2026)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Average-reward Reinforcement Learning With Trust Region Methods (2021)0.00
- Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization (2018)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Uncertainty-aware Policy Optimization: A Robust, Adaptive Trust Region Approach (2020)0.00