Genpo: Generative Diffusion Models Meet On-policy Reinforcement Learning
2025 Β· Shutong Ding, Ke Hu, Shan Zhong, et al.
Abstract
Recent advances in reinforcement learning (RL) have demonstrated the powerful exploration capabilities and multimodality of generative diffusion-based policies. While substantial progress has been made in offline RL and off-policy RL settings, integrating diffusion policies into on-policy frameworks like PPO remains underexplored. This gap is particularly significant given the widespread use of large-scale parallel GPU-accelerated simulators, such as IsaacLab, which are optimized for on-policy RL algorithms and enable rapid training of complex robotic tasks. A key challenge lies in computing state-action log-likelihoods under diffusion policies, which is straightforward for Gaussian policies but intractable for flow-based models due to irreversible forward-reverse processes and discretization errors (e.g., Euler-Maruyama approximations). To bridge this gap, we propose GenPO, a generative policy optimization framework that leverages exact diffusion inversion to construct invertible acti
Authors
(none)
Tags
Stats
Related papers
- Diffpogan: Diffusion Policies With Generative Adversarial Networks For Offline Reinforcement Learning (2024)0.00
- Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective (2024)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Evolving Diffusion And Flow Matching Policies For Online Reinforcement Learning (2025)0.00
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- Dichotomous Diffusion Policy Optimization (2025)0.00
- Fine-tuning Diffusion Policies With Backpropagation Through Diffusion Timesteps (2025)0.00
- Reverse Flow Matching: A Unified Framework For Online Reinforcement Learning With Diffusion And Flow Policies (2026)0.00