One-step Flow Q-learning: Addressing The Diffusion Policy Bottleneck In Offline Reinforcement Learning
2025 Β· Thanh Nguyen, Chang D. Yoo
Abstract
Diffusion Q-Learning (DQL) has established diffusion policies as a high-performing paradigm for offline reinforcement learning, but its reliance on multi-step denoising for action generation renders both training and inference slow and fragile. Existing efforts to accelerate DQL toward one-step denoising typically rely on auxiliary modules or policy distillation, sacrificing either simplicity or performance. It remains unclear whether a one-step policy can be trained directly without such trade-offs. To this end, we introduce One-Step Flow Q-Learning (OFQL), a novel framework that enables effective one-step action generation during both training and inference, without auxiliary modules or distillation. OFQL reformulates the DQL policy within the Flow Matching (FM) paradigm but departs from conventional FM by learning an average velocity field that directly supports accurate one-step action generation. This design removes the need for multi-step denoising and backpropagation-through-tim
Authors
(none)
Tags
Stats
Related papers
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Reverse Flow Matching: A Unified Framework For Online Reinforcement Learning With Diffusion And Flow Policies (2026)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- Evolving Diffusion And Flow Matching Policies For Online Reinforcement Learning (2025)0.00
- Boosting Continuous Control With Consistency Policy (2023)3.58
- Entropy-regularized Diffusion Policy With Q-ensembles For Offline Reinforcement Learning (2024)3.58
- Diffusion Policies With Value-conditional Optimization For Offline Reinforcement Learning (2025)0.00