Optimistic {\epsilon}-greedy Exploration For Cooperative Multi-agent Reinforcement Learning
2025 Β· Ruoning Zhang, Siying Wang, Wenyu Chen, et al.
Abstract
The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, due to the representational limitations of traditional monotonic value decomposition methods, algorithms can underestimate optimal actions, leading policies to suboptimal solutions. To address this challenge, we propose Optimistic \(\epsilon\)-Greedy Exploration, focusing on enhancing exploration to correct value estimations. The underestimation arises from insufficient sampling of optimal actions during exploration, as our analysis indicated. We introduce an optimistic updating network to identify optimal actions and sample actions from its distribution with a probability of \(\epsilon\) during exploration, increasing the selection frequency of optimal actions. Experimental results in various environments reveal that the Optimistic \(\epsilon\)-Greedy Exploration effectively prevents the algorithm from suboptimal solutions and significantly i
Authors
(none)
Tags
Stats
Related papers
- Toward Risk-based Optimistic Exploration For Cooperative Multi-agent Reinforcement Learning (2023)0.00
- Centralized Cooperative Exploration Policy For Continuous Control Tasks (2023)0.00
- Intrinsic Action Tendency Consistency For Cooperative Multi-agent Reinforcement Learning (2024)5.24
- Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning (2021)0.00
- Fully Decentralized Cooperative Multi-agent Reinforcement Learning: A Survey (2024)0.00
- Prioritized Guidance For Efficient Multi-agent Reinforcement Learning Exploration (2019)0.00
- Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration (2018)0.00
- Multi-agent Guided Policy Optimization (2025)0.00