Regret Bounds For Decentralized Learning In Cooperative Multi-agent Dynamical Systems
2020 Β· Seyed Mohammad Asghari, Yi Ouyang, Ashutosh Nayyar
Abstract
Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in \(T\) regret, where \(T\) is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed M
Authors
(none)
Tags
Stats
Related papers
- Inducing Cooperation Via Team Regret Minimization Based Multi-agent Deep Reinforcement Learning (2019)0.00
- Regret-minimization Algorithms For Multi-agent Cooperative Learning Systems (2023)0.00
- Impact Of Decentralized Learning On Player Utilities In Stackelberg Games (2024)0.00
- Provably Efficient Multi-agent Reinforcement Learning With Fully Decentralized Communication (2021)0.00
- Implications Of Regret On Stability Of Linear Dynamical Systems (2022)6.34
- On Improving Model-free Algorithms For Decentralized Multi-agent Reinforcement Learning (2021)0.00
- MA2QL: A Minimalist Approach To Fully Decentralized Multi-agent Reinforcement Learning (2022)0.00
- Distributed No-regret Learning In Multi-agent Systems (2020)0.00