Matching Multiple Experts: On The Exploitability Of Multi-agent Imitation Learning
2026 Β· Antoine Bergerault, Volkan Cevher, Negar Mehr
Abstract
Multi-agent imitation learning (MA-IL) aims to learn optimal policies from expert demonstrations of interactions in multi-agent interactive domains. Despite existing guarantees on the performance of the resulting learned policies, characterizations of how far the learned polices are from a Nash equilibrium are missing for offline MA-IL. In this paper, we demonstrate impossibility and hardness results of learning low-exploitable policies in general \(n\)-player Markov Games. We do so by providing examples where even exact measure matching fails, and demonstrating a new hardness result on characterizing the Nash gap given a fixed measure matching error. We then show how these challenges can be overcome using strategic dominance assumptions on the expert equilibrium. Specifically, for the case of dominant strategy expert equilibria, assuming Behavioral Cloning error \(\epsilon_\{\text\{BC\}\}\), this provides a Nash imitation gap of \(\mathcal\{O\}\left(n\epsilon_\{\text\{BC\}\}/(1-\gamma
Authors
(none)
Tags
Stats
Related papers
- Learning Equilibria From Data: Provably Efficient Multi-agent Imitation Learning (2025)0.00
- Toward The Fundamental Limits Of Imitation Learning (2020)0.00
- Minimax Optimal Online Imitation Learning Via Replay Estimation (2022)0.00
- Mimicking To Dominate: Imitation Learning Strategies For Success In Multiagent Competitive Games (2023)0.00
- A Bayesian Solution To The Imitation Gap (2024)0.00
- Consistent Opponent Modeling In Imperfect-information Games (2025)0.00
- Of Moments And Matching: A Game-theoretic Framework For Closing The Imitation Gap (2021)0.00
- Beyond-expert Performance With Limited Demonstrations: Efficient Imitation Learning With Double Exploration (2025)0.00