A Joint Imitation-reinforcement Learning Framework For Reduced Baseline Regret
2022 Β· Sheelabhadra Dey, Sumedh Pendurkar, Guni Sharon, et al.
Abstract
In various control task domains, existing controllers provide a baseline level of performance that -- though possibly suboptimal -- should be maintained. Reinforcement learning (RL) algorithms that rely on extensive exploration of the state and action space can be used to optimize a control policy. However, fully exploratory RL algorithms may decrease performance below a baseline level during training. In this paper, we address the issue of online optimization of a control policy while minimizing regret w.r.t a baseline policy performance. We present a joint imitation-reinforcement learning framework, denoted JIRL. The learning process in JIRL assumes the availability of a baseline policy and is designed with two objectives in mind \textbf\{(a)\} leveraging the baseline's online demonstrations to minimize the regret w.r.t the baseline policy during training, and \textbf\{(b)\} eventually surpassing the baseline performance. JIRL addresses these objectives by initially learning to imita
Authors
(none)
Tags
Stats
Related papers
- Offline Imitation Learning From Multiple Baselines With Applications To Compiler Optimization (2024)0.00
- On-policy Robot Imitation Learning From A Converging Supervisor (2019)0.00
- Bayesian Robust Optimization For Imitation Learning (2020)0.00
- RLIF: Interactive Imitation Learning As Reinforcement Learning (2023)0.00
- A Reduction From Reinforcement Learning To No-regret Online Learning (2019)0.00
- Dual RL: Unification And New Methods For Reinforcement And Imitation Learning (2023)0.00
- The Fallacy Of Minimizing Cumulative Regret In The Sequential Task Setting (2024)0.00
- Blending Imitation And Reinforcement Learning For Robust Policy Improvement (2023)0.00