Smoothed Functional-based Gradient Algorithms For Off-policy Reinforcement Learning: A Non-asymptotic Viewpoint
2021 Β· Nithia Vijayan, Prashanth L. A
Abstract
We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning (RL) context. Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme. The first algorithm is a straightforward combination of importance sampling-based off-policy evaluation with SF-based gradient estimation. The second algorithm, inspired by the stochastic variance-reduced gradient (SVRG) algorithm, incorporates variance reduction in the update iteration. For both algorithms, we derive non-asymptotic bounds that establish convergence to an approximate stationary point. From these results, we infer that the first algorithm converges at a rate that is comparable to the well-known REINFORCE algorithm in an off-policy RL context, while the second algorithm exhibits an improved rate of convergence.
Authors
(none)
Tags
Stats
Related papers
- A Policy Gradient Approach For Optimization Of Smooth Risk Measures (2022)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- Batch Reinforcement Learning With A Nonparametric Off-policy Policy Gradient (2020)0.00
- Interpolated Policy Gradient: Merging On-policy And Off-policy Gradient Estimation For Deep Reinforcement Learning (2017)0.00
- Convergence And Optimality Of Policy Gradient Methods In Weakly Smooth Settings (2021)3.58
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- Off-policy Policy Gradient Algorithms By Constraining The State Distribution Shift (2019)0.00
- On-policy Policy Gradient Reinforcement Learning Without On-policy Sampling (2023)0.00