Negative Update Intervals In Deep Multi-agent Reinforcement Learning
2018 Β· Gregory Palmer, Rahul Savani, Karl Tuyls
Abstract
In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game, within which agents must overcome multiple pathologies simultaneously, including relative overgeneralisation, stochasticity, the alter-exploration and moving tar
Authors
(none)
Tags
Stats
Related papers
- Dealing With Non-stationarity In Decentralized Cooperative Multi-agent Deep Reinforcement Learning Via Multi-timescale Learning (2023)0.00
- Lenient Multi-agent Deep Reinforcement Learning (2017)4.52
- Delay-aware Multi-agent Reinforcement Learning For Cooperative And Competitive Environments (2020)0.00
- Hierarchical Deep Multiagent Reinforcement Learning With Temporal Abstraction (2018)0.00
- Iterative Update And Unified Representation For Multi-agent Reinforcement Learning (2019)0.00
- MA2QL: A Minimalist Approach To Fully Decentralized Multi-agent Reinforcement Learning (2022)0.00
- Breaking The Curse Of Multiagency In Robust Multi-agent Reinforcement Learning (2024)0.00
- Chaos Persists In Large-scale Multi-agent Learning Despite Adaptive Learning Rates (2023)0.00