Generalized Policy Improvement Algorithms With Theoretically Supported Sample Reuse
2022 Β· James Queeney, Ioannis Ch. Paschalidis, Christos G. Cassandras
Abstract
We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control. Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse, addressing a trade-off between two important deployment requirements for real-world control: (i) practical performance guarantees and (ii) data efficiency. We demonstrate the benefits of this new class of algorithms through extensive experimental analysis on a broad range of simulated control tasks.
Authors
(none)
Tags
Stats
Related papers
- Off-policy RL Algorithms Can Be Sample-efficient For Continuous Control Via Sample Multiple Reuse (2023)0.00
- Theoretically Guaranteed Policy Improvement Distilled From Model-based Planning (2023)2.26
- When To Trust Your Model: Model-based Policy Optimization (2019)0.00
- Reproducibility Of Benchmarked Deep Reinforcement Learning Tasks For Continuous Control (2017)0.00
- Interpolated Policy Gradient: Merging On-policy And Off-policy Gradient Estimation For Deep Reinforcement Learning (2017)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Neuro-algorithmic Policies Enable Fast Combinatorial Generalization (2021)0.00