Abstract

Regret minimization has proved to be a versatile tool for tree-form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player's decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs -- even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restrictions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used -- for example, those in which only black-box access to the environment is available. We give the first, to our knowledge, regret-minimizati

Authors

(none)

Tags

  • Game AI

Stats

  • citations4
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score5.24
  • arxiv keyfarina2021model

Related papers