Cim-ppo:proximal Policy Optimization With Liu-correntropy Induced Metric
2021 Β· Yunxiao Guo, Han Long, Xiaojun Duan, et al.
Abstract
As a popular Deep Reinforcement Learning (DRL) algorithm, Proximal Policy Optimization (PPO) has demonstrated remarkable efficacy in numerous complex tasks. According to the penalty mechanism in a surrogate, PPO can be classified into PPO with KL divergence (PPO-KL) and PPO with Clip (PPO-Clip). In this paper, we analyze the impact of asymmetry in KL divergence on PPO-KL and highlight that when this asymmetry is pronounced, it will misguide the improvement of the surrogate. To address this issue, we represent the PPO-KL in inner product form and demonstrate that the KL divergence is a Correntropy Induced Metric (CIM) in Euclidean space. Subsequently, we extend the PPO-KL to the Reproducing Kernel Hilbert Space (RKHS), redefine the inner products with RKHS, and propose the PPO-CIM algorithm. Moreover, this paper states that the PPO-CIM algorithm has a lower computation cost in policy gradient and proves that PPO-CIM can guarantee the new policy is within the trust region while the kerne
Authors
(none)
Tags
Stats
Related papers
- Truly Proximal Policy Optimization (2019)0.00
- KIPPO: Koopman-inspired Proximal Policy Optimization (2025)0.00
- Proximal Policy Optimization Via Enhanced Exploration Efficiency (2020)13.70
- PPO-CMA: Proximal Policy Optimization With Covariance Matrix Adaptation (2018)0.00
- A Theoretical Analysis Of Optimistic Proximal Policy Optimization In Linear Markov Decision Processes (2023)0.00
- Proximal Policy Optimization With Relative Pearson Divergence (2020)6.77
- Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization (2018)0.00
- Revisiting Design Choices In Proximal Policy Optimization (2020)0.00