Complexity-regularized Proximal Policy Optimization
2025 Β· Luca Serfilippi, Giorgio Franceschelli, Antonio Corradi, et al.
Abstract
Policy gradient methods usually rely on entropy regularization to prevent premature convergence. However, maximizing entropy indiscriminately pushes the policy towards a uniform distribution, often overriding the reward signal if not optimally tuned. We propose replacing the standard entropy term with a self-regulating complexity term, defined as the product of Shannon entropy and disequilibrium, where the latter quantifies the distance from the uniform distribution. Unlike pure entropy, which favors maximal disorder, this complexity measure is zero for both fully deterministic and perfectly uniform distributions, i.e., it is strictly positive for systems that exhibit a meaningful interplay between order and randomness. These properties ensure the policy maintains beneficial stochasticity while reducing regularization pressure when the policy is highly uncertain, allowing learning to focus on reward optimization. We introduce Complexity-Regularized Proximal Policy Optimization (CR-PPO)
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization Reinforcement Learning With Entropy Regularization (2019)0.00
- Understanding The Impact Of Entropy On Policy Optimization (2018)0.00
- Proximal Policy Optimization With Relative Pearson Divergence (2020)6.77
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00
- Truly Proximal Policy Optimization (2019)0.00
- Simple Policy Optimization (2024)0.00
- Arbitrary Entropy Policy Optimization Breaks The Exploration Bottleneck Of Reinforcement Learning (2025)0.00
- On Proximal Policy Optimization's Heavy-tailed Gradients (2021)0.00