Reverse Flow Matching: A Unified Framework For Online Reinforcement Learning With Diffusion And Flow Policies
2026 Β· Zeyang Li, Sunbochen Tang, Navid Azizan
Abstract
Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty in online RL is the lack of direct samples from the target distribution; instead, the target is an unnormalized Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which utilizes a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. Yet, it remains unclear how these objectives relate formally or if they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples. By adopting a reverse inferential
Authors
(none)
Tags
Stats
Related papers
- Evolving Diffusion And Flow Matching Policies For Online Reinforcement Learning (2025)0.00
- FM-IRL: Flow-matching For Reward Modeling And Policy Regularization In Reinforcement Learning (2025)0.00
- Diffusionnft: Online Diffusion Reinforcement With Forward Process (2025)0.00
- Genpo: Generative Diffusion Models Meet On-policy Reinforcement Learning (2025)0.00
- Composite Flow Matching For Reinforcement Learning With Shifted-dynamics Data (2025)0.00
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Policy Representation Via Diffusion Probability Model For Reinforcement Learning (2023)0.00
- One-step Flow Q-learning: Addressing The Diffusion Policy Bottleneck In Offline Reinforcement Learning (2025)0.00