Policyflow: Policy Optimization With Continuous Normalizing Flow In Reinforcement Learning
2026 Β· Shunpeng Yang, Ben Liu, Hua Chen
Abstract
Among on-policy reinforcement learning algorithms, Proximal Policy Optimization (PPO) demonstrates is widely favored for its simplicity, numerical stability, and strong empirical performance. Standard PPO relies on surrogate objectives defined via importance ratios, which require evaluating policy likelihood that is typically straightforward when the policy is modeled as a Gaussian distribution. However, extending PPO to more expressive, high-capacity policy models such as continuous normalizing flows (CNFs), also known as flow-matching models, is challenging because likelihood evaluation along the full flow trajectory is computationally expensive and often numerically unstable. To resolve this issue, we propose PolicyFlow, a novel on-policy CNF-based reinforcement learning algorithm that integrates expressive CNF policies with PPO-style objectives without requiring likelihood evaluation along the full flow path. PolicyFlow approximates importance ratios using velocity field variations
Authors
(none)
Tags
Stats
Related papers
- Flowpg: Action-constrained Policy Gradient With Normalizing Flows (2024)0.00
- Truly Proximal Policy Optimization (2019)0.00
- Simple Policy Optimization (2024)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Policy Optimization With Model-based Explorations (2018)5.84
- KIPPO: Koopman-inspired Proximal Policy Optimization (2025)0.00
- Revisiting Design Choices In Proximal Policy Optimization (2020)0.00
- Evolving Diffusion And Flow Matching Policies For Online Reinforcement Learning (2025)0.00