Comparing Exploration-exploitation Strategies Of Llms And Humans: Insights From Standard Multi-armed Bandit Experiments

Abstract

arXiv:2505.09901v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used to simulate or automate human behavior in complex sequential decision-making settings. A natural question is then whether LLMs exhibit similar decision-making behavior to humans, and can achieve comparable (or superior) performance. In this work, we focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty. We employ canonical multi-armed bandit (MAB) experiments introduced in the cognitive science and psychiatry literature to conduct a comparative study of the E&E strategies of LLMs, humans, and MAB algorithms. We use interpretable choice models to capture the E&E strategies of the agents and investigate how enabling thinking traces, through both prompting strategies and thinking models, shapes LLM decision-making. We find that enabling thinking in LLMs shifts their behavior toward more human-like behavior, ch

Comparing Exploration-exploitation Strategies Of Llms And Humans: Insights From Standard Multi-armed Bandit Experiments

Abstract

Authors

Tags

Stats

Related papers