Online Learning For Uninformed Markov Games: Empirical Nash-value Regret And Non-stationarity Adaptation
2026 Β· Junyan Liu, Haipeng Luo, Zihan Zhang, et al.
Abstract
We study online learning in two-player uninformed Markov games, where the opponent's actions and policies are unobserved. In this setting, Tian et al. (2021) show that achieving no-external-regret is impossible without incurring an exponential dependence on the episode length \(H\). They then turn to the weaker notion of Nash-value regret and propose a V-learning algorithm with regret \(O(K^\{2/3\})\) after \(K\) episodes. However, their algorithm and guarantee do not adapt to the difficulty of the problem: even in the case where the opponent follows a fixed policy and thus \(O(\sqrt\{K\})\) external regret is well-known to be achievable, their result is still the worse rate \(O(K^\{2/3\})\) on a weaker metric. In this work, we fully address both limitations. First, we introduce empirical Nash-value regret, a new regret notion that is strictly stronger than Nash-value regret and naturally reduces to external regret when the opponent follows a fixed policy. Moreover, under this new me
Authors
(none)
Tags
Stats
Related papers
- Online Learning In Unknown Markov Games (2020)0.00
- Learning In Markov Games With Adaptive Adversaries: Policy Regret, Fundamental Barriers, And Efficient Algorithms (2024)0.00
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- Learning Two-player Mixture Markov Games: Kernel Function Approximation And Correlated Equilibrium (2022)0.00
- No-regret Learning In Unknown Games With Correlated Payoffs (2019)0.00
- Adversarial Learning In Games With Bandit Feedback: Logarithmic Pure-strategy Maximin Regret (2026)0.00
- Model-free Online Learning In Unknown Sequential Decision Making Problems And Games (2021)5.24
- Dynamic Regret Of Online Markov Decision Processes (2022)0.00