Cautiously Optimistic Policy Optimization And Exploration With Linear Function Approximation
2021 Β· Andrea Zanette, Ching-An Cheng, Alekh Agarwal
Abstract
Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts. However, the same properties also make them slow to converge and sample inefficient, as the on-policy requirement precludes data reuse and the incremental updates couple large iteration complexity into the sample complexity. These characteristics have been observed in experiments as well as in theory in the recent work of~\citet\{agarwal2020pc\}, which provides a policy optimization method PCPG that can robustly find near optimal polices for approximately linear Markov decision processes but suffers from an extremely poor sample complexity compared with value-based techniques. In this paper, we propose a new algorithm, COPOE, that overcomes the sample complexity issue of PCPG while retaining its robustness to model misspecification. Compared with PCPG, COPOE makes several important algorithmic enhancements,
Authors
(none)
Tags
Stats
Related papers
- Provably Efficient Exploration In Policy Optimization (2019)0.00
- Low-switching Policy Gradient With Exploration Via Online Sensitivity Sampling (2023)0.00
- Policy Optimization With Model-based Explorations (2018)5.84
- Optimistic Policy Optimization Is Provably Efficient In Non-stationary Mdps (2021)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- Competitive Policy Optimization (2020)0.00
- Linear Function Approximation As A Computationally Efficient Method To Solve Classical Reinforcement Learning Challenges (2024)0.00
- A Theoretical Analysis Of Optimistic Proximal Policy Optimization In Linear Markov Decision Processes (2023)0.00