Performative Reinforcement Learning With Linear Markov Decision Process
2024 Β· Debmalya Mandal, Goran Radanovic
Abstract
We study the setting of *performative reinforcement learning* where the deployed policy affects both the reward, and the transition of the underlying Markov decision process. Prior work~\parencite\{MTR23\} has addressed this problem under the tabular setting and established last-iterate convergence of repeated retraining with iteration complexity explicitly depending on the number of states. In this work, we generalize the results to *linear Markov decision processes* which is the primary theoretical model of large-scale MDPs. The main challenge with linear MDP is that the regularized objective is no longer strongly convex and we want a bound that scales with the dimension of the features, rather than states which can be infinite. Our first result shows that repeatedly optimizing a regularized objective converges to a *performatively stable policy*. In the absence of strong convexity, our analysis leverages a new recurrence relation that uses a specific linear combination of optimal du
Authors
(none)
Tags
Stats
Related papers
- Performative Reinforcement Learning (2022)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- Achieve Performatively Optimal Policy For Performative Reinforcement Learning (2025)0.00
- Performative Policy Gradient: Optimality In Performative Reinforcement Learning (2025)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Linear-quadratic Mean-field Reinforcement Learning: Convergence Of Policy Gradient Methods (2019)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Globally Optimal Hierarchical Reinforcement Learning For Linearly-solvable Markov Decision Processes (2021)2.26