Variance-reduced Conservative Policy Iteration
2022 Β· Naman Agarwal, Brian Bullins, Karan Singh
Abstract
We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a \(\epsilon\)-functional local optimum from \(O(\epsilon^\{-4\})\) to \(O(\epsilon^\{-3\})\). Under state-coverage and policy-completeness assumptions, the algorithm enjoys \(\epsilon\)-global optimality after sampling \(O(\epsilon^\{-2\})\) times, improving upon the previously established \(O(\epsilon^\{-3\})\) sample requirement.
Authors
(none)
Tags
Stats
Related papers
- Sample Efficient Policy Gradient Methods With Recursive Variance Reduction (2019)0.00
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Conservative Optimistic Policy Optimization Via Multiple Importance Sampling (2021)0.00
- An Improved Convergence Analysis Of Stochastic Variance-reduced Policy Gradient (2019)0.00
- Easy Monotonic Policy Iteration (2016)0.00
- An Approximate Policy Iteration Viewpoint Of Actor-critic Algorithms (2022)2.26
- Variance Reduction For Policy-gradient Methods Via Empirical Variance Minimization (2022)0.00