Dynamic Decision-making Under Model Misspecification: A Stochastic Stability Approach
2026 Β· Xinyu Dai, Daniel Chen, Yian Qian
Abstract
Dynamic decision-making under model uncertainty is central to many economic environments, yet existing bandit and reinforcement learning algorithms rely on the assumption of correct model specification. This paper studies the behavior and performance of one of the most commonly used Bayesian reinforcement learning algorithms, Thompson Sampling (TS), when the model class is misspecified. We first provide a complete dynamic classification of posterior evolution in a misspecified two-armed Gaussian bandit, identifying distinct regimes: correct model concentration, incorrect model concentration, and persistent belief mixing, characterized by the direction of statistical evidence and the model-action mapping. These regimes yield sharp predictions for limiting beliefs, action frequencies, and asymptotic regret. We then extend the analysis to a general finite model class and develop a unified stochastic stability framework that represents posterior evolution as a Markov process on the belief
Authors
(none)
Tags
Stats
Related papers
- On The Model-misspecification In Reinforcement Learning (2023)0.00
- Modeling The Effects Of Environmental And Perceptual Uncertainty Using Deterministic Reinforcement Learning Dynamics With Partial Observability (2021)9.59
- Reinforcement Learning Under Model Mismatch (2017)0.00
- Bayesian Bandits: Balancing The Exploration-exploitation Tradeoff Via Double Sampling (2017)0.00
- Learning The Reward Function For A Misspecified Model (2018)0.00
- Non-stationary Reinforcement Learning: The Blessing Of (more) Optimism (2019)0.00
- Online Robust Reinforcement Learning With Model Uncertainty (2021)0.00
- A Mathematical Programming Approach To Computing And Learning Berk--nash Equilibria In Infinite-horizon Mdps (2026)0.00