MARL With General Utilities Via Decentralized Shadow Reward Actor-critic
2021 Β· Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, et al.
Abstract
We posit a new mechanism for cooperation in multi-agent reinforcement learning (MARL) based upon any nonlinear function of the team's long-term state-action occupancy measure, i.e., a *general utility*. This subsumes the cumulative return but also allows one to incorporate risk-sensitivity, exploration, and priors. % We derive the \{\bf D\}ecentralized \{\bf S\}hadow Reward \{\bf A\}ctor-\{\bf C\}ritic (DSAC) in which agents alternate between policy evaluation (critic), weighted averaging with neighbors (information mixing), and local gradient updates for their policy parameters (actor). DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i.e., the "shadow reward". DSAC converges to \(\epsilon\)-stationarity in \(\mathcal\{O\}(1/\epsilon^\{2.5\})\) (Theorem \ref\{theorem:final\}) or faster \(\mathcal\{O\}(1/\epsilon^\{2\})\) (Corolla
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- Learning To Coordinate In Multi-agent Systems: A Coordinated Actor-critic Algorithm And Finite-time Guarantees (2021)0.00
- Fully Decentralized Multi-agent Reinforcement Learning With Networked Agents (2018)0.00
- Scalable Multi-agent Reinforcement Learning For Networked Systems With Average Reward (2020)0.00
- F2A2: Flexible Fully-decentralized Approximate Actor-critic For Cooperative Multi-agent Reinforcement Learning (2020)0.00
- Modeling The Interaction Between Agents In Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Context-aware Bayesian Network Actor-critic Methods For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Policy Distillation And Value Matching In Multiagent Reinforcement Learning (2019)10.48