AM-PPO: (advantage) Alpha-modulation With Proximal Policy Optimization
2025 Β· Soham Sane
Abstract
Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm that heavily relies on accurate advantage estimates for stable and efficient training. However, raw advantage signals can exhibit significant variance, noise, and scale-related issues, impeding optimal learning performance. To address this challenge, we introduce Advantage Modulation PPO (AM-PPO), a novel enhancement of PPO that adaptively modulates advantage estimates using a dynamic, non-linear scaling mechanism. This adaptive modulation employs an alpha controller that dynamically adjusts the scaling factor based on evolving statistical properties of the advantage signals, such as their norm, variance, and a predefined target saturation level. By incorporating a tanh-based gating function driven by these adaptively scaled advantages, AM-PPO reshapes the advantage signals to stabilize gradient updates and improve the conditioning of the policy gradient landscape. Crucially, this modulation also influ
Authors
(none)
Tags
Stats
Related papers
- Truly Proximal Policy Optimization (2019)0.00
- Proximal Policy Optimization Via Enhanced Exploration Efficiency (2020)13.70
- ANO: A Principled Approach To Robust Policy Optimization (2026)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Proximal Policy Optimization With Adaptive Exploration (2024)0.00
- KIPPO: Koopman-inspired Proximal Policy Optimization (2025)0.00
- PPO-CMA: Proximal Policy Optimization With Covariance Matrix Adaptation (2018)0.00
- Policy Optimization With Model-based Explorations (2018)5.84