FM-IRL: Flow-matching For Reward Modeling And Policy Regularization In Reinforcement Learning
2025 Β· Zhenglin Wan, Jingxuan Wu, Xingrui Yu, et al.
Abstract
Flow Matching (FM) has shown remarkable ability in modeling complex distributions and achieves strong performance in offline imitation learning for cloning expert behaviors. However, despite its behavioral cloning expressiveness, FM-based policies are inherently limited by their lack of environmental interaction and exploration. This leads to poor generalization in unseen scenarios beyond the expert demonstrations, underscoring the necessity of online interaction with environment. Unfortunately, optimizing FM policies via online interaction is challenging and inefficient due to instability in gradient computation and high inference costs. To address these issues, we propose to let a student policy with simple MLP structure explore the environment and be online updated via RL algorithm with a reward model. This reward model is associated with a teacher FM model, containing rich information of expert data distribution. Furthermore, the same teacher FM model is utilized to regularize the
Authors
(none)
Tags
Stats
Related papers
- Reverse Flow Matching: A Unified Framework For Online Reinforcement Learning With Diffusion And Flow Policies (2026)0.00
- Evolving Diffusion And Flow Matching Policies For Online Reinforcement Learning (2025)0.00
- Controllable Flow Matching For Online Reinforcement Learning (2025)0.00
- Guided Flow Policy: Learning From High-value Actions In Offline Reinforcement Learning (2025)0.00
- Composite Flow Matching For Reinforcement Learning With Shifted-dynamics Data (2025)0.00
- Flow To Control: Offline Reinforcement Learning With Lossless Primitive Discovery (2022)3.58
- Value-guidance Meanflow For Offline Multi-agent Reinforcement Learning (2026)0.00
- Online Matching Via Reinforcement Learning: An Expert Policy Orchestration Strategy (2025)0.00