Pessimism-free Offline Learning In General-sum Games Via KL Regularization
2026 Β· Claire Chen, Yuheng Zhang
Abstract
arXiv:2605.00264v1 Announce Type: new Abstract: Offline multi-agent reinforcement learning in general-sum settings is challenged by the distribution shift between logged datasets and target equilibrium policies. While standard methods rely on manual pessimistic penalties, we demonstrate that KL regularization suffices to stabilize learning and achieve equilibrium recovery. We propose General-sum Anchored Nash Equilibrium (GANE), which recovers regularized Nash equilibria at an accelerated statistical rate of \(\widetilde\{O\}(1/n)\). For computational tractability, we develop General-sum Anchored Mirror Descent (GAMD), an iterative algorithm converging to a Coarse Correlated Equilibrium at the standard rate of \(\widetilde\{O\}(1/\sqrt\{n\}+1/T)\). These results establish KL regularization as a standalone mechanism for pessimism-free offline learning that achieves equivalent or accelerated rates in multi-player general-sum games.
Authors
(none)
Tags
Stats
Related papers
- Model-based Reinforcement Learning For Offline Zero-sum Markov Games (2022)0.00
- Regret Minimization And Convergence To Equilibria In General-sum Markov Games (2022)0.00
- Conservative Equilibrium Discovery In Offline Game-theoretic Multiagent Reinforcement Learning (2026)0.00
- Provably Efficient Reinforcement Learning In Decentralized General-sum Markov Games (2021)0.00
- Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning From Offline Datasets (2022)0.00
- Provably Efficient Generalized Lagrangian Policy Optimization For Safe Multi-agent Reinforcement Learning (2023)0.00
- Nearly Minimax Optimal Offline Reinforcement Learning With Linear Function Approximation: Single-agent MDP And Markov Game (2022)0.00
- State-aware Proximal Pessimistic Algorithms For Offline Reinforcement Learning (2022)0.00