A Mathematical Programming Approach To Computing And Learning Berk--nash Equilibria In Infinite-horizon Mdps
2026 Β· Quanyan Zhu, Zhengye Han
Abstract
We study sequential decision-making when the agent's internal model class is misspecified. Within the infinite-horizon Berk-Nash framework, stable behavior arises as a fixed point: the agent acts optimally relative to a subjective model, while that model is statistically consistent with the long-run data endogenously generated by the policy itself. We provide a rigorous characterization of this equilibrium via coupled linear programs and a bilevel optimization formulation. To address the intrinsic non-smoothness of standard best-response correspondences, we introduce entropy regularization, establishing the existence of a unique soft Bellman fixed point and a smooth objective. Exploiting this regularity, we develop an online learning scheme that casts model selection as an adversarial bandit problem using an EXP3-type update, augmented by a novel conjecture-set zooming mechanism that adaptively refines the parameter space. Numerical results demonstrate effective exploration-exploitatio
Authors
(none)
Tags
Stats
Related papers
- Learning Equilibria In Adversarial Team Markov Games: A Nonconvex-hidden-concave Min-max Optimization Problem (2024)0.00
- Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework (2020)8.60
- Asymmetric Nash Seeking Via Best Response Maps: Global Linear Convergence And Robustness To Inexact Reaction Models (2026)0.00
- Bayesian Learning Of Optimal Policies In Markov Decision Processes With Countably Infinite State-space (2023)0.00
- Can We Find Nash Equilibria At A Linear Rate In Markov Games? (2023)0.00
- Learning In Zero-sum Markov Games: Relaxing Strong Reachability And Mixing Time Assumptions (2023)0.00
- Empirical Policy Optimization For \(n\)-player Markov Games (2021)0.00
- Information-theoretic Methods For Planning And Learning In Partially Observable Markov Decision Processes (2016)0.00