Online Learning In Iterated Prisoner's Dilemma To Mimic Human Behavior
2020 Β· Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi
Abstract
As an important psychological and social experiment, the Iterated Prisoner's Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We propose to study the behaviors of online learning algorithms in the Iterated Prisoner's Dilemma (IPD) game, where we investigate the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated, as an effort to connect artificial intelligence a
Authors
(none)
Tags
Stats
Related papers
- Towards Cooperation In Sequential Prisoner's Dilemmas: A Deep Multiagent Reinforcement Learning Approach (2018)0.00
- Learning Through Probing: A Decentralized Reinforcement Learning Architecture For Social Dilemmas (2018)0.00
- Learning To Influence Human Behavior With Offline Reinforcement Learning (2023)0.00
- An Analytical Model Of Active Inference In The Iterated Prisoner's Dilemma (2023)0.00
- Dilution, Diffusion And Symbiosis In Spatial Prisoner's Dilemma With Reinforcement Learning (2025)0.00
- Improved Cooperation By Balancing Exploration And Exploitation In Intertemporal Social Dilemma Tasks (2021)0.00
- Steering Control Of Payoff-maximizing Players In Adaptive Learning Dynamics (2023)2.26
- Cooperative Artificial Intelligence (2022)0.00