Achieve Performatively Optimal Policy For Performative Reinforcement Learning
2025 Β· Ziyi Chen, Heng Huang
Abstract
Performative reinforcement learning is an emerging dynamical decision making framework, which extends reinforcement learning to the common applications where the agent's policy can change the environmental dynamics. Existing works on performative reinforcement learning only aim at a performatively stable (PS) policy that maximizes an approximate value function. However, there is a provably positive constant gap between the PS policy and the desired performatively optimal (PO) policy that maximizes the original value function. In contrast, this work proposes a zeroth-order Frank-Wolfe algorithm (0-FW) algorithm with a zeroth-order approximation of the performative policy gradient in the Frank-Wolfe framework, and obtains \textbf\{the first polynomial-time convergence to the desired PO\} policy under the standard regularizer dominance condition. For the convergence analysis, we prove two important properties of the nonconvex value function. First, when the policy regularizer dominates th
Authors
(none)
Tags
Stats
Related papers
- Performative Reinforcement Learning (2022)0.00
- Performative Policy Gradient: Optimality In Performative Reinforcement Learning (2025)0.00
- Performative Reinforcement Learning With Linear Markov Decision Process (2024)0.00
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Unified Policy Optimization For Continuous-action Reinforcement Learning In Non-stationary Tasks And Games (2022)2.26
- Absolute Policy Optimization (2023)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- A Theoretical Analysis Of Optimistic Proximal Policy Optimization In Linear Markov Decision Processes (2023)0.00