Fitting Reinforcement Learning Model To Behavioral Data Under Bandits
2025 Β· Hao Zhu, Jasper Hoffmann, Baohe Zhang, et al.
Abstract
We consider the problem of fitting a reinforcement learning (RL) model to some given behavioral data under a multi-armed bandit environment. These models have received much attention in recent years for characterizing human and animal decision making behavior. We provide a generic mathematical optimization problem formulation for the fitting problem of a wide range of RL models that appear frequently in scientific research applications. We then provide a detailed theoretical analysis of its convexity properties. Based on the theoretical results, we introduce a novel solution method for the fitting problem of RL models based on convex relaxation and optimization. Our method is then evaluated in several simulated and real-world bandit environments to compare with some benchmark methods that appear in the literature. Numerical results indicate that our method achieves comparable performance to the state-of-the-art, while significantly reducing computation time. We also provide an open-sou
Authors
(none)
Tags
Stats
Related papers
- Unified Models Of Human Behavioral Agents In Bandits, Contextual Bandits And RL (2020)8.35
- A Bandit Framework For Optimal Selection Of Reinforcement Learning Agents (2019)0.00
- Online Bayesian Risk-averse Reinforcement Learning (2025)0.00
- Beyond Variance Reduction: Understanding The True Impact Of Baselines On Policy Optimization (2020)0.00
- Provably Efficient Reinforcement Learning For Adversarial Restless Multi-armed Bandits With Unknown Transitions And Bandit Feedback (2024)0.00
- Reinforcement Learning With Convex Constraints (2019)0.00
- Reinforcement Learning Agent Design And Optimization With Bandwidth Allocation Model (2022)0.00
- Towards A Pretrained Model For Restless Bandits Via Multi-arm Generalization (2023)0.00