Abstract

Learning in games refers to scenarios where multiple players interact in a shared environment, each aiming to minimize their regret. An equilibrium can be computed at a fast rate of \(O(1/T)\) when all players follow the optimistic follow-the-regularized-leader (OFTRL). However, this acceleration is limited to the honest regime, in which all players adhere to a prescribed algorithm -- a situation that may not be realistic in practice. To address this issue, we present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm. First, in two-player zero-sum corrupted games, we provide learning dynamics for which the external regret of \(x\)-player (and similarly for \(y\)-player) is roughly bounded by \(O(log (m_x m_y) + \sqrt\{\hat\{C\}_y\} + \hat\{C\}_x)\), where \(m_x\) and \(m_y\) denote the number of actions of \(x\)- and \(y\)-players, respectively, and \

Authors

(none)

Tags

  • Game AI

Stats

Related papers