Approximate Exploitability: Learning A Best Response In Large Games
2020 Β· Finbarr Timbers, Nolan Bard, Edward Lockhart, et al.
Abstract
Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically fails to evaluate robustness to worst-case outcomes. Prior research in computer poker has examined how to assess such worst-case performance, both exactly and approximately. Unfortunately, exact computation is infeasible with larger domains, and existing approximations rely on poker-specific knowledge. We introduce ISMCTS-BR, a scalable search-based deep reinforcement learning algorithm for learning a best response to an agent, thereby approximating worst-case performance. We demonstrate the technique in several two-player zero-sum games against a variety of agents, including several AlphaZer
Authors
(none)
Tags
Stats
Related papers
- Efficient Exploration Of Zero-sum Stochastic Games (2020)0.00
- Learning From Learners: Adapting Reinforcement Learning Agents To Be Competitive In A Card Game (2020)0.00
- In-context Exploiter For Extensive-form Games (2024)0.00
- Combining Tree-search, Generative Models, And Nash Bargaining Concepts In Game-theoretic Reinforcement Learning (2023)0.00
- Learning To Play No-press Diplomacy With Best Response Policy Iteration (2020)0.00
- Are Alphazero-like Agents Robust To Adversarial Perturbations? (2022)0.00
- A Deep Reinforcement Learning Approach For Finding Non-exploitable Strategies In Two-player Atari Games (2022)0.00
- Evaluation And Learning In Two-player Symmetric Games Via Best And Better Responses (2022)0.00