Intrinsic Fluctuations Of Reinforcement Learning Promote Cooperation
2022 Β· Wolfram Barfuss, Janusz Meylahn
Abstract
In this work, we ask for and answer what makes classical temporal-difference reinforcement learning with epsilon-greedy strategies cooperative. Cooperating in social dilemma situations is vital for animals, humans, and machines. While evolutionary theory revealed a range of mechanisms promoting cooperation, the conditions under which agents learn to cooperate are contested. Here, we demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation. We use the iterated Prisoner's dilemma with one-period memory as a testbed. Each of the two learning agents learns a strategy that conditions the following action choices on both agents' action choices of the last round. We find that next to a high caring for future rewards, a low exploration rate, and a small learning rate, it is primarily intrinsic stochastic fluctuations of the reinforcement learning process which double the final rate of cooperation to up to 80%. Thus, inherent noise is not a necessary
Authors
(none)
Tags
Stats
Related papers
- Improved Cooperation By Balancing Exploration And Exploitation In Intertemporal Social Dilemma Tasks (2021)0.00
- Cooperation And Reputation Dynamics With Reinforcement Learning (2021)3.58
- Evolving Intrinsic Motivations For Altruistic Behavior (2018)2.26
- On The Emergence Of Cooperation In The Repeated Prisoner's Dilemma (2022)0.00
- The Price Of Paranoia: Robust Risk-sensitive Cooperation In Non-stationary Multi-agent Reinforcement Learning (2026)0.00
- Cooperation Dynamics In Multi-agent Systems: Exploring Game-theoretic Scenarios With Mean-field Equilibria (2023)0.00
- Towards Cooperation In Sequential Prisoner's Dilemmas: A Deep Multiagent Reinforcement Learning Approach (2018)0.00
- Reciprocal Reward Influence Encourages Cooperation From Self-interested Agents (2024)1.91