Policy Evaluation And Seeking For Multi-agent Reinforcement Learning Via Best Response
2020 Β· Rui Yan, Xiaoming Duan, Zongying Shi, et al.
Abstract
This paper introduces two metrics (cycle-based and memory-based metrics), grounded on a dynamical game-theoretic solution concept called sink equilibrium, for the evaluation, ranking, and computation of policies in multi-agent learning. We adopt strict best response dynamics (SBRD) to model selfish behaviors at a meta-level for multi-agent reinforcement learning. Our approach can deal with dynamical cyclical behaviors (unlike approaches based on Nash equilibria and Elo ratings), and is more compatible with single-agent reinforcement learning than alpha-rank which relies on weakly better responses. We first consider settings where the difference between largest and second largest underlying metric has a known lower bound. With this knowledge we propose a class of perturbed SBRD with the following property: only policies with maximum metric are observed with nonzero probability for a broad class of stochastic games with finite memory. We then consider settings where the lower bound for t
Authors
(none)
Tags
Stats
Related papers
- Evaluation And Learning In Two-player Symmetric Games Via Best And Better Responses (2022)0.00
- Population-based Evaluation In Repeated Rock-paper-scissors As A Benchmark For Multiagent Reinforcement Learning (2023)0.00
- Bounded Risk-sensitive Markov Games: Forward Policy Design And Inverse Reward Learning With Iterative Reasoning And Cumulative Prospect Theory (2020)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- A Generalized Training Approach For Multiagent Learning (2019)0.00
- Reinforcing Competitive Multi-agents For Playing 'so Long Sucker' (2024)0.00
- Unifying Behavioral And Response Diversity For Open-ended Learning In Zero-sum Games (2021)0.00
- On Convergence And Optimality Of Best-response Learning With Policy Types In Multiagent Systems (2019)0.00