Performative Policy Gradient: Optimality In Performative Reinforcement Learning
2025 Β· Debabrota Basu, Udvas Das, Brahim Driss, et al.
Abstract
Post-deployment machine learning algorithms often influence the environments they act in, and thus shift the underlying dynamics that the standard reinforcement learning (RL) methods ignore. While designing optimal algorithms in this performative setting has recently been studied in supervised learning, the RL counterpart remains under-explored. In this paper, we prove the performative counterparts of the performance difference lemma and the policy gradient theorem in RL, and further introduce the Performative Policy Gradient algorithm (PePG). PePG is the first policy gradient algorithm designed to account for performativity in RL. Under softmax parametrisation, and also with and without entropy regularisation, we prove that PePG converges to performatively optimal policies, i.e. policies that remain optimal under the distribution shifts induced by themselves. Thus, PePG significantly extends the prior works in Performative RL that achieves performative stability but not optimality. Fu
Authors
(none)
Tags
Stats
Related papers
- Performative Reinforcement Learning (2022)0.00
- Achieve Performatively Optimal Policy For Performative Reinforcement Learning (2025)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Performative Reinforcement Learning With Linear Markov Decision Process (2024)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- Residual Policy Gradient: A Reward View Of Kl-regularized Objective (2025)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- Practical Performative Policy Learning With Strategic Agents (2024)0.00