Conservative Equilibrium Discovery In Offline Game-theoretic Multiagent Reinforcement Learning
2026 Β· Austin A. Nguyen, Michael P. Wellman
Abstract
Offline learning of strategies takes data efficiency to its extreme by restricting algorithms to a fixed dataset of state-action trajectories. We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game under the offline learning constraint. We first frame this problem in terms of selecting among candidate equilibria. Since datasets may inform only a small fraction of game dynamics, it is generally infeasible in offline game-solving to even verify a proposed solution is a true equilibrium. Therefore, we consider the relative probability of low regret (i.e., closeness to equilibrium) across candidates based on the information available. Specifically, we extend Policy Space Response Oracles (PSRO), an online game-solving approach, by quantifying game dynamics uncertainty and modifying the RL objective to skew towards solutions more likely to have low regret in the true game. We further propose a novel meta-strategy solver, tailored for the offline sett
Authors
(none)
Tags
Stats
Related papers
- Offline Fictitious Self-play For Competitive Games (2024)0.00
- Plan Better Amid Conservatism: Offline Multi-agent Reinforcement Learning With Actor Rectification (2021)0.00
- Counterfactual Conservative Q Learning For Offline Multi-agent Reinforcement Learning (2023)0.00
- Sample Efficient Active Algorithms For Offline Reinforcement Learning (2026)0.00
- Revisiting Design Choices In Offline Model-based Reinforcement Learning (2021)6.34
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00
- Believe What You See: Implicit Constraint Approach For Offline Multi-agent Reinforcement Learning (2021)0.00
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00