Learning To Play Against Any Mixture Of Opponents
2020 Β· Max Olan Smith, Thomas Anthony, Yongzhao Wang, et al.
Abstract
Intuitively, experience playing against one mixture of opponents in a given domain should be relevant for a different mixture in the same domain. We propose a transfer learning method, Q-Mixing, that starts by learning Q-values against each pure-strategy opponent. Then a Q-value for any distribution of opponent strategies is approximated by appropriately averaging the separately learned Q-values. From these components, we construct policies against all opponent mixtures without any further training. We empirically validate Q-Mixing in two environments: a simple grid-world soccer environment, and a complicated cyber-security game. We find that Q-Mixing is able to successfully transfer knowledge across any mixture of opponents. We next consider the use of observations during play to update the believed distribution of opponents. We introduce an opponent classifier -- trained in parallel to Q-learning, using the same data -- and use the classifier results to refine the mixing of Q-values.
Authors
(none)
Tags
Stats
Related papers
- Weighted QMIX: Expanding Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2020)0.00
- Contextual Policy Transfer In Reinforcement Learning Domains Via Deep Mixtures-of-experts (2020)0.00
- Simplex Neural Population Learning: Any-mixture Bayes-optimality In Symmetric Zero-sum Games (2022)0.00
- Double Deep Q-learning In Opponent Modeling (2022)0.00
- QMIX: Monotonic Value Function Factorisation For Deep Multi-agent Reinforcement Learning (2018)0.00
- MMD-MIX: Value Function Factorisation With Maximum Mean Discrepancy For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- POWQMIX: Weighted Value Factorization With Potentially Optimal Joint Actions Recognition For Cooperative Multi-agent Reinforcement Learning (2024)0.00
- QR-MIX: Distributional Value Function Factorisation For Cooperative Multi-agent Reinforcement Learning (2020)0.00