Bregman Gradient Policy Optimization
2021 Β· Feihu Huang, Shangqian Gao, Heng Huang
Abstract
In the paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques. Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration. Meanwhile, we further propose an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on the variance reduced technique. Moreover, we provide a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting. We prove that our BGPO achieves a sample complexity of \(O(\epsilon^\{-4\})\) for finding \(\epsilon\)-stationary policy only requiring one trajectory at each iteration, and our VR-BGPO reaches the best known sample complexity of \(O(\epsilon^\{-3\})\), which also only requires one trajectory at each iteration. In particular, by using different Bregman divergences, our BGPO framework unifies many existing policy opti
Authors
(none)
Tags
Stats
Related papers
- Policy Optimization With Stochastic Mirror Descent (2019)7.50
- Proximal Policy Optimization Algorithms (2017)0.00
- Beyond KL Divergence: Policy Optimization With Flexible Bregman Divergences For LLM Reasoning (2026)0.00
- MDPGT: Momentum-based Decentralized Policy Gradient Tracking (2021)0.00
- Divergence-augmented Policy Optimization (2025)0.00
- Policy Gradient For Robust Markov Decision Processes (2024)0.00
- Variance Reduction Based Partial Trajectory Reuse To Accelerate Policy Gradient Optimization (2022)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00