Expected Policy Gradients For Reinforcement Learning
2018 Β· Kamil Ciosek, Shimon Whiteson
Abstract
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in the sampled trajectory. For continuous action spaces, we first derive a practical result for Gaussian policies and quadratic critics and then extend it to a universal analytical method, covering a broad class of actors and critics, including Gaussian, exponential families, and policies with bounded support. For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions. For discrete action spaces, we derive a variant of EPG based on softmax policies. We also establish a new general policy gradient theorem, of which the stochastic and deterministic policy gradient theorems are
Authors
(none)
Tags
Stats
Related papers
- Expected Policy Gradients (2017)0.00
- Fourier Policy Gradients (2018)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- All-action Policy Gradient Methods: A Numerical Integration Approach (2019)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- Zeroth-order Deterministic Policy Gradient (2020)0.00
- Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State (2025)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00