Smoothing Policies And Safe Policy Gradients
2019 Β· Matteo Papini, Matteo Pirotta, Marcello Restelli
Abstract
Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications of reinforcement learning to real-world control tasks, such as robotics. However, the trial-and-error nature of these methods poses safety issues whenever the learning process itself must be performed on a physical system or involves any form of human-computer interaction. In this paper, we address a specific safety formulation, where both goals and dangers are encoded in a scalar reward signal and the learning agent is constrained to never worsen its performance, measured as the expected sum of rewards. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows us to identify meta-parameter schedules that guarantee monotonic improvement with hig
Authors
(none)
Tags
Stats
Related papers
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- On The Global Optimality Of Policy Gradient Methods In General Utility Reinforcement Learning (2024)0.00
- Stabilizing Policy Gradient Methods Via Reward Profiling (2025)0.00
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- Policy Gradient Algorithms With Monte Carlo Tree Learning For Non-markov Decision Processes (2022)0.00
- Model-free Policy Learning With Reward Gradients (2021)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00