Comparing Exploration-exploitation Strategies Of Llms And Humans: Insights From Standard Multi-armed Bandit Experiments
2026 Β· Ziyuan Zhang, Darcy Wang, Ningyuan Chen, et al.
Abstract
arXiv:2505.09901v3 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used to simulate or automate human behavior in complex sequential decision-making settings. A natural question is then whether LLMs exhibit similar decision-making behavior to humans, and can achieve comparable (or superior) performance. In this work, we focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty. We employ canonical multi-armed bandit (MAB) experiments introduced in the cognitive science and psychiatry literature to conduct a comparative study of the E&E strategies of LLMs, humans, and MAB algorithms. We use interpretable choice models to capture the E&E strategies of the agents and investigate how enabling thinking traces, through both prompting strategies and thinking models, shapes LLM decision-making. We find that enabling thinking in LLMs shifts their behavior toward more human-like behavior, ch
Authors
(none)
Tags
Stats
Related papers
- Balancing Act: Prioritization Strategies For Llm-designed Restless Bandit Rewards (2024)0.00
- Unified Models Of Human Behavioral Agents In Bandits, Contextual Bandits And RL (2020)8.35
- Modeling Human Exploration Through Resource-rational Reinforcement Learning (2022)2.26
- Llm-explorer: A Plug-in Reinforcement Learning Policy Exploration Enhancement Driven By Large Language Models (2025)0.00
- Mental Modeling Of Reinforcement Learning Agents By Language Models (2024)0.00
- From Laws To Motivation: Guiding Exploration Through Law-based Reasoning And Rewards (2024)0.00
- A Frequency-domain Analysis Of The Multi-armed Bandit Problem: A New Perspective On The Exploration-exploitation Trade-off (2025)0.00
- Tompo: Training LLM Strategic Decision Making From A Multi-agent Perspective (2025)0.00