Learning Equilibria From Data: Provably Efficient Multi-agent Imitation Learning
2025 Β· Till Freihaut, Luca Viano, Volkan Cevher, et al.
Abstract
This paper provides the first expert sample complexity characterization for learning a Nash equilibrium from expert data in Markov Games. We show that a new quantity named the single policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting, and we provide an upper bound for behavioral cloning (BC) featuring such coefficient. BC exhibits substantial regret in games with high concentrability coefficient, leading us to utilize expert queries to develop and introduce two novel solution algorithms: MAIL-BRO and MURMAIL. The former employs a best response oracle and learns an \(\epsilon\)-Nash equilibrium with \(\mathcal\{O\}(\epsilon^\{-4\})\) expert and oracle queries. The latter bypasses completely the best response oracle at the cost of a worse expert query complexity of order \(\mathcal\{O\}(\epsilon^\{-8\})\). Finally, we provide numerical evidence, confirming our theoretical findings.
Authors
(none)
Tags
Stats
Related papers
- Matching Multiple Experts: On The Exploitability Of Multi-agent Imitation Learning (2026)0.00
- Efficiently Computing Nash Equilibria In Adversarial Team Markov Games (2022)0.00
- Learning Equilibria In Adversarial Team Markov Games: A Nonconvex-hidden-concave Min-max Optimization Problem (2024)0.00
- Strategically Robust Multi-agent Reinforcement Learning With Linear Function Approximation (2026)0.00
- Consistent Opponent Modeling In Imperfect-information Games (2025)0.00
- Model-free Learning For Two-player Zero-sum Partially Observable Markov Games With Perfect Recall (2021)0.00
- Toward The Fundamental Limits Of Imitation Learning (2020)0.00
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26