Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning
2021 Β· Robert Loftin, Aadirupa Saha, Sam Devlin, et al.
Abstract
High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty can significantly improve the sample efficiency of RL in single agent tasks. This work seeks to understand the role of optimistic exploration in non-cooperative multi-agent settings. We will show that, in zero-sum games, optimistic exploration can cause the learner to waste time sampling parts of the state space that are irrelevant to strategic play, as they can only be reached through cooperation between both players. To address this issue, we introduce a formal notion of strategically efficient exploration in Markov games, and use this to develop two strategically efficient learning algorithms for finite Markov games. We demonstrate that these methods can be significantly more sample efficient than their optimistic counterparts.
Authors
(none)
Tags
Stats
Related papers
- Minimax-optimal Multi-agent RL In Markov Games With A Generative Model (2022)2.26
- Toward Risk-based Optimistic Exploration For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Incentivize Without Bonus: Provably Efficient Model-based Online Multi-agent RL For Markov Games (2025)0.00
- On Optimistic Versus Randomized Exploration In Reinforcement Learning (2017)0.00
- Fast Exploration With Simplified Models And Approximately Optimistic Planning In Model Based Reinforcement Learning (2018)0.00
- Sample-efficient Robust Multi-agent Reinforcement Learning In The Face Of Environmental Uncertainty (2024)0.00
- Towards Better Sample Efficiency In Multi-agent Reinforcement Learning Via Exploration (2025)0.00
- Enhancing Sample Efficiency In Multi-agent RL With Uncertainty Quantification And Selective Exploration (2025)0.00