Model-free Online Learning In Unknown Sequential Decision Making Problems And Games
2021 Β· Gabriele Farina, Tuomas Sandholm
Abstract
Regret minimization has proved to be a versatile tool for tree-form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player's decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs -- even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restrictions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used -- for example, those in which only black-box access to the environment is available. We give the first, to our knowledge, regret-minimizati
Authors
(none)
Tags
Stats
Related papers
- Online Learning In Unknown Markov Games (2020)0.00
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- Decentralized Model-free Reinforcement Learning In Stochastic Games With Average-reward Objective (2023)0.00
- The Fallacy Of Minimizing Cumulative Regret In The Sequential Task Setting (2024)0.00
- On The Complexity Of Computing Sparse Equilibria And Lower Bounds For No-regret Learning In Games (2023)0.00
- Evolutionary Dynamics And \(\phi\)-regret Minimization In Games (2021)3.58
- Best Of Both Worlds: Regret Minimization Versus Minimax Play (2025)0.00
- Combining No-regret And Q-learning (2019)0.00