Abstract

In this paper we establish efficient and *uncoupled* learning dynamics so that, when employed by all players in a general-sum multiplayer game, the *swap regret* of each player after \(T\) repetitions of the game is bounded by \(O(log T)\), improving over the prior best bounds of \(O(log^4 (T))\). At the same time, we guarantee optimal \(O(\sqrt\{T\})\) swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a *time-invariant* learning rate, the *second-order path lengths* of the dynamics up to time \(T\) are bounded by \(O(log T)\), a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way *optimistic* regularized learning with the use of *self-concordant barriers*. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently dev

Authors

(none)

Tags

  • Game AI

Stats

Related papers