ANO: A Principled Approach To Robust Policy Optimization
2026 Β· Yiheng Zhang, Yiming Wang, Kaiyan Zhao, et al.
Abstract
arXiv:2605.02320v2 Announce Type: replace-cross Abstract: Proximal Policy Optimization (PPO) dominates reinforcement learning and LLM alignment but relies on a "hard clipping" mechanism that discards valuable gradients. Conversely, unconstrained methods like SPO expose the optimization to unbounded updates, causing severe instability and policy collapse during extreme outlier encounters. To resolve this dilemma, we introduce a principled design space for policy optimization, demonstrating that a robust estimator must inherently suppress outliers while maintaining a smooth restoration force. Guided by these geometric principles, we derive Anchored Neighborhood Optimization (ANO), a novel method that seamlessly replaces hard clipping with a redescending gradient mechanism. Extensive evaluations demonstrate ANO's empirical superiority across diverse domains. In continuous (MuJoCo) and discrete (Atari) control, ANO establishes a robust state-of-the-art, uniquely preventing policy collapse
Authors
(none)
Tags
Stats
Related papers
- Revisiting Design Choices In Proximal Policy Optimization (2020)0.00
- Truly Proximal Policy Optimization (2019)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Simple Policy Optimization (2024)0.00
- Absolute Policy Optimization (2023)0.00
- KIPPO: Koopman-inspired Proximal Policy Optimization (2025)0.00
- AM-PPO: (advantage) Alpha-modulation With Proximal Policy Optimization (2025)0.00
- Proximal Policy Optimization With Adaptive Exploration (2024)0.00