Blending Imitation And Reinforcement Learning For Robust Policy Improvement
2023 Β· Xuefeng Liu, Takuma Yoneda, Rick L. Stevens, et al.
Abstract
While reinforcement learning (RL) has shown promising performance, its sample complexity continues to be a substantial hurdle, restricting its broader application across a variety of domains. Imitation learning (IL) utilizes oracles to improve sample efficiency, yet it is often constrained by the quality of the oracles deployed. which actively interleaves between IL and RL based on an online estimate of their performance. RPI draws on the strengths of IL, using oracle queries to facilitate exploration, an aspect that is notably challenging in sparse-reward RL, particularly during the early stages of learning. As learning unfolds, RPI gradually transitions to RL, effectively treating the learned policy as an improved oracle. This algorithm is capable of learning from and improving upon a diverse set of black-box oracles. Integral to RPI are Robust Active Policy Selection (RAPS) and Robust Policy Gradient (RPG), both of which reason over whether to perform state-wise imitation from the o
Authors
(none)
Tags
Stats
Related papers
- Active Policy Improvement From Multiple Black-box Oracles (2023)0.00
- Policy Improvement Via Imitation Of Multiple Oracles (2020)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Policy Improvement Reinforcement Learning (2026)0.00
- Dual RL: Unification And New Methods For Reinforcement And Imitation Learning (2023)0.00
- Bayesian Robust Optimization For Imitation Learning (2020)0.00
- A New Framework For Query Efficient Active Imitation Learning (2019)0.00
- Learning Self-imitating Diverse Policies (2018)0.00