Revisiting Generative Policies: A Simpler Reinforcement Learning Algorithmic Perspective
2024 Β· Jinouwen Zhang, Rongkun Xue, Yazhe Niu, et al.
Abstract
Generative models, particularly diffusion models, have achieved remarkable success in density estimation for multimodal data, drawing significant interest from the reinforcement learning (RL) community, especially in policy modeling in continuous action spaces. However, existing works exhibit significant variations in training schemes and RL optimization objectives, and some methods are only applicable to diffusion models. In this study, we compare and analyze various generative policy training and deployment techniques, identifying and validating effective designs for generative policy algorithms. Specifically, we revisit existing training objectives and classify them into two categories, each linked to a simpler approach. The first approach, Generative Model Policy Optimization (GMPO), employs a native advantage-weighted regression formulation as the training objective, which is significantly simpler than previous methods. The second approach, Generative Model Policy Gradient (GMPG),
Authors
(none)
Tags
Stats
Related papers
- Genpo: Generative Diffusion Models Meet On-policy Reinforcement Learning (2025)0.00
- Evolving Diffusion And Flow Matching Policies For Online Reinforcement Learning (2025)0.00
- Diffusion Policy Through Conditional Proximal Policy Optimization (2026)0.00
- Diffpogan: Diffusion Policies With Generative Adversarial Networks For Offline Reinforcement Learning (2024)0.00
- Dichotomous Diffusion Policy Optimization (2025)0.00
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- AEGPO: Adaptive Entropy-guided Policy Optimization For Diffusion Models (2026)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00