On The Emergence Of Cooperation In The Repeated Prisoner's Dilemma
2022 Β· Maximilian Schaefer
Abstract
Using simulations between pairs of \(\epsilon\)-greedy q-learners with one-period memory, this article demonstrates that the potential function of the stochastic replicator dynamics (Foster and Young, 1990) allows it to predict the emergence of error-proof cooperative strategies from the underlying parameters of the repeated prisoner's dilemma. The observed cooperation rates between q-learners are related to the ratio between the kinetic energy exerted by the polar attractors of the replicator dynamics under the grim trigger strategy. The frontier separating the parameter space conducive to cooperation from the parameter space dominated by defection can be found by setting the kinetic energy ratio equal to a critical value, which is a function of the discount factor, \(f(\delta) = \delta/(1-\delta)\), multiplied by a correction term to account for the effect of the algorithms' exploration probability. The gradient at the frontier increases with the distance between the game parameters
Authors
(none)
Tags
Stats
Related papers
- Intrinsic Fluctuations Of Reinforcement Learning Promote Cooperation (2022)9.23
- Towards Cooperation In Sequential Prisoner's Dilemmas: A Deep Multiagent Reinforcement Learning Approach (2018)0.00
- Cooperation Dynamics In Multi-agent Systems: Exploring Game-theoretic Scenarios With Mean-field Equilibria (2023)0.00
- Symmetric Equilibrium Of Multi-agent Reinforcement Learning In Repeated Prisoner's Dilemma (2021)8.60
- The Price Of Paranoia: Robust Risk-sensitive Cooperation In Non-stationary Multi-agent Reinforcement Learning (2026)0.00
- Cooperation And Reputation Dynamics With Reinforcement Learning (2021)3.58
- The Bounds Of Algorithmic Collusion; \(q\)-learning, Gradient Learning, And The Folk Theorem (2024)0.00
- Reciprocal Reward Influence Encourages Cooperation From Self-interested Agents (2024)1.91