PPO In The Fisher-rao Geometry
2025 · Razvan-Andrei Lascu, David Šiška, Łukasz Szpruch
Abstract
Proximal Policy Optimization (PPO) is widely used in reinforcement learning due to its strong empirical performance, yet it lacks formal guarantees for policy improvement and convergence. PPO's clipped surrogate objective is motivated by a lower bound on linearization of the value function in flat geometry setting. We derive a tighter surrogate objective and introduce Fisher-Rao PPO (FR-PPO) by leveraging the Fisher-Rao (FR) geometry. Our scheme provides strong theoretical guarantees, including monotonic policy improvement. In the direct parametrization setting, we show that FR-PPO achieves sub-linear convergence with no dependence on action or state space dimensions, and for parametrized policies we further obtain sub-linear convergence up to the compatible function approximation error. Finally, although our primary focus is theoretical, we also demonstrate empirically that FR-PPO performs well across a range of standard reinforcement learning tasks.
Authors
(none)
Tags
Stats
Related papers
- Truly Proximal Policy Optimization (2019)0.00
- Simple Policy Optimization (2024)0.00
- Proximal Policy Optimization Via Enhanced Exploration Efficiency (2020)13.70
- KIPPO: Koopman-inspired Proximal Policy Optimization (2025)0.00
- Cim-ppo:proximal Policy Optimization With Liu-correntropy Induced Metric (2021)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- A Theoretical Analysis Of Optimistic Proximal Policy Optimization In Linear Markov Decision Processes (2023)0.00
- Neural Proximal/trust Region Policy Optimization Attains Globally Optimal Policy (2019)0.00