On The Sample Complexity Of Differentially Private Policy Optimization
2025 Β· Yi He, Xingyu Zhou
Abstract
Policy optimization (PO) is a cornerstone of modern reinforcement learning (RL), with diverse applications spanning robotics, healthcare, and large language model training. The increasing deployment of PO in sensitive domains, however, raises significant privacy concerns. In this paper, we initiate a theoretical study of differentially private policy optimization, focusing explicitly on its sample complexity. We first formalize an appropriate definition of differential privacy (DP) tailored to PO, addressing the inherent challenges arising from on-policy learning dynamics and the subtlety involved in defining the unit of privacy. We then systematically analyze the sample complexity of widely-used PO algorithms, including policy gradient (PG), natural policy gradient (NPG) and more, under DP constraints and various settings, via a unified framework. Our theoretical results demonstrate that privacy costs can often manifest as lower-order terms in the sample complexity, while also highlig
Authors
(none)
Tags
Stats
Related papers
- Simple Policy Optimization (2024)0.00
- Low-switching Policy Gradient With Exploration Via Online Sensitivity Sampling (2023)0.00
- Proximal Policy Optimization Algorithms (2017)0.00
- Differentially Private Policy Evaluation (2016)0.00
- Reparameterization Proximal Policy Optimization (2025)0.00
- Policy Optimization As Online Learning With Mediator Feedback (2020)0.00
- Efficient Differentially Private Fine-tuning Of Llms Via Reinforcement Learning (2025)0.00
- Policy Optimization With Penalized Point Probability Distance: An Alternative To Proximal Policy Optimization (2018)0.00