Combining Deep Reinforcement Learning And Search For Imperfect-information Games
2020 Β· Noam Brown, Anton Bakhtin, Adam Lerer, et al.
Abstract
The combination of deep reinforcement learning and search at both training and test time is a powerful paradigm that has led to a number of successes in single-agent settings and perfect-information games, best exemplified by AlphaZero. However, prior algorithms of this form cannot cope with imperfect-information games. This paper presents ReBeL, a general framework for self-play reinforcement learning and search that provably converges to a Nash equilibrium in any two-player zero-sum game. In the simpler setting of perfect-information games, ReBeL reduces to an algorithm similar to AlphaZero. Results in two different imperfect-information games show ReBeL converges to an approximate Nash equilibrium. We also show ReBeL achieves superhuman performance in heads-up no-limit Texas hold'em poker, while using far less domain knowledge than any prior poker AI.
Authors
(none)
Tags
Stats
Related papers
- Deep Reinforcement Learning From Self-play In Imperfect-information Games (2016)0.00
- Score-based Equilibrium Learning In Multi-player Finite Games With Imperfect Information (2023)0.00
- Impartial Games: A Challenge For Reinforcement Learning (2022)0.00
- Colosseumrl: A Framework For Multiagent Reinforcement Learning In \(n\)-player Games (2019)0.00
- Regret-guided Search Control For Efficient Learning In Alphazero (2026)0.00
- Supervised And Reinforcement Learning From Observations In Reconnaissance Blind Chess (2022)7.16
- Reinforcement Learning In Two Player Zero Sum Simultaneous Action Games (2021)0.00
- Simplified Action Decoder For Deep Multi-agent Reinforcement Learning (2019)4.03