Reinforcement Learning With Unbiased Policy Evaluation And Linear Function Approximation
2022 Β· Anna Winnicki, R. Srikant
Abstract
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are useful for very large MDPs, including lookahead, function approximation, and gradient descent. Specifically, we analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via least squares minimization at each iteration and the second algorithm involves a two-time-scale stochastic approximation algorithm taking several steps of gradient descent towards the least squares solution before obtaining the next iterate using a stochastic approximation algorithm.
Authors
(none)
Tags
Stats
Related papers
- The Role Of Lookahead And Approximate Policy Evaluation In Reinforcement Learning With Linear Value Function Approximation (2021)0.00
- Minimax-optimal Off-policy Evaluation With Linear Function Approximation (2020)0.00
- Accelerated And Instance-optimal Policy Evaluation With Linear Function Approximation (2021)0.00
- Provably Efficient Reinforcement Learning With Linear Function Approximation (2019)11.76
- Adaptive Approximate Policy Iteration (2020)0.00
- Improved Regret For Efficient Online Reinforcement Learning With Linear Function Approximation (2023)0.00
- Robust Reinforcement Learning Using Least Squares Policy Iteration With Provable Performance Guarantees (2020)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00