Value-guidance Meanflow For Offline Multi-agent Reinforcement Learning
2026 Β· Teng Pang, Zhiqiang Dong, Yan Zhang, et al.
Abstract
Offline multi-agent reinforcement learning (MARL) aims to learn the optimal joint policy from pre-collected datasets, requiring a trade-off between maximizing global returns and mitigating distribution shift from offline data. Recent studies use diffusion or flow generative models to capture complex joint policy behaviors among agents; however, they typically rely on multi-step iterative sampling, thereby reducing training and inference efficiency. Although further research improves sampling efficiency through methods like distillation, it remains sensitive to the behavior regularization coefficient. To address the above-mentioned issues, we propose Value Guidance Multi-agent MeanFlow Policy (VGM\(^2\)P), a simple yet effective flow-based policy learning framework that enables efficient action generation with coefficient-insensitive conditional behavior cloning. Specifically, VGM\(^2\)P uses global advantage values to guide agent collaboration, treating optimal policy learning as condi
Authors
(none)
Tags
Stats
Related papers
- Evolving Diffusion And Flow Matching Policies For Online Reinforcement Learning (2025)0.00
- Learning From Good Trajectories In Offline Multi-agent Reinforcement Learning (2022)5.24
- Guided Flow Policy: Learning From High-value Actions In Offline Reinforcement Learning (2025)0.00
- Value Propagation For Decentralized Networked Deep Multi-agent Reinforcement Learning (2019)0.00
- Factored Value Functions For Graph-based Multi-agent Reinforcement Learning (2026)0.00
- Offline Multi-agent Reinforcement Learning With Implicit Global-to-local Value Regularization (2023)5.84
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00