On Convergence And Optimality Of Best-response Learning With Policy Types In Multiagent Systems
2019 Β· Stefano V. Albrecht, Subramanian Ramamoorthy
Abstract
While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert could provide a specification of this set, albeit only a partially correct one. Algorithms have been proposed by several researchers to compute posterior beliefs over such policy libraries, which can then be used to determine optimal actions. In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen. We analyse convergence properties of two existing posterior formulations and propose a new posterior whic
Authors
(none)
Tags
Stats
Related papers
- An Empirical Study On The Practical Impact Of Prior Beliefs Over Policy Types (2019)5.24
- Risk-sensitive Bayesian Games For Multi-agent Reinforcement Learning Under Policy Uncertainty (2022)0.00
- A Generalized Training Approach For Multiagent Learning (2019)0.00
- On Information Asymmetry In Competitive Multi-agent Reinforcement Learning: Convergence And Optimality (2020)0.00
- Policy Optimization For Markov Games: Unified Framework And Faster Convergence (2022)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- Non-cooperative Multi-agent Systems With Exploring Agents (2020)0.00