Performative Reinforcement Learning
2022 Β· Debmalya Mandal, Stelios Triantafyllou, Goran Radanovic
Abstract
We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment. Following the recent literature on performative prediction~\cite\{Perdomo et. al., 2020\}, we introduce the concept of performatively stable policy. We then consider a regularized version of the reinforcement learning problem and show that repeatedly optimizing this objective converges to a performatively stable policy under reasonable assumptions on the transition dynamics. Our proof utilizes the dual perspective of the reinforcement learning problem and may be of independent interest in analyzing the convergence of other algorithms with decision-dependent environments. We then extend our results for the setting where the learner just performs gradient ascent steps instead of fully optimizing the objective, and for the setting where the learner has access to a finite number of trajectories from the changed en
Authors
(none)
Tags
Stats
Related papers
- Performative Reinforcement Learning With Linear Markov Decision Process (2024)0.00
- Performative Policy Gradient: Optimality In Performative Reinforcement Learning (2025)0.00
- Achieve Performatively Optimal Policy For Performative Reinforcement Learning (2025)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- The Reinforce Policy Gradient Algorithm Revisited (2023)0.00
- Practical Performative Policy Learning With Strategic Agents (2024)0.00
- Reward-conditioned Policies (2019)0.00
- Unified Policy Optimization For Continuous-action Reinforcement Learning In Non-stationary Tasks And Games (2022)2.26