Factored Value Functions For Graph-based Multi-agent Reinforcement Learning
2026 Β· Ahmed Rashwan, Keith Briggs, Chris Budd, et al.
Abstract
Credit assignment is a core challenge in multi-agent reinforcement learning (MARL), especially in large-scale systems with structured, local interactions. Graph-based Markov decision processes (GMDPs) capture such settings via an influence graph, but standard critics are poorly aligned with this structure: global value functions provide weak per-agent learning signals, while existing local constructions can be difficult to estimate and ill-behaved in infinite-horizon settings. We introduce the Diffusion Value Function (DVF), a factored value function for GMDPs that assigns to each agent a value component by diffusing rewards over the influence graph with temporal discounting and spatial attenuation. We show that DVF is well-defined, admits a Bellman fixed point, and decomposes the global discounted value via an averaging property. DVF can be used as a drop-in critic in standard RL algorithms and estimated scalably with graph neural networks. Building on DVF, we propose Diffusion A2C (D
Authors
(none)
Tags
Stats
Related papers
- A Unified Framework For Factorizing Distributional Value Functions For Multi-agent Reinforcement Learning (2023)0.00
- DFAC Framework: Factorizing The Value Function Via Quantile Mixture For Multi-agent Distributional Q-learning (2021)0.00
- Adaptive Value Decomposition With Greedy Marginal Contribution Computation For Cooperative Multi-agent Reinforcement Learning (2023)3.58
- Residual Q-networks For Value Function Factorizing In Multi-agent Reinforcement Learning (2022)10.21
- Distributed Multi-agent Reinforcement Learning Based On Graph-induced Local Value Functions (2022)4.52
- Value Propagation For Decentralized Networked Deep Multi-agent Reinforcement Learning (2019)0.00
- Qfree: A Universal Value Function Factorization For Multi-agent Reinforcement Learning (2023)0.00
- Value-guidance Meanflow For Offline Multi-agent Reinforcement Learning (2026)0.00