Offline Imitation Learning From Multiple Baselines With Applications To Compiler Optimization
2024 Β· Teodor V. Marinov, Alekh Agarwal, Mircea Trofin
Abstract
This work studies a Reinforcement Learning (RL) problem in which we are given a set of trajectories collected with K baseline policies. Each of these policies can be quite suboptimal in isolation, and have strong performance in complementary parts of the state space. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space. We propose a simple imitation learning based algorithm, show a sample complexity bound on its accuracy and prove that the the algorithm is minimax optimal by showing a matching lower bound. Further, we apply the algorithm in the setting of machine learning guided compiler optimization to learn policies for inlining programs with the objective of creating a small binary. We demonstrate that we can learn a policy that outperforms an initial policy learned via standard RL through a few iterations of our approach.
Authors
(none)
Tags
Stats
Related papers
- A Joint Imitation-reinforcement Learning Framework For Reduced Baseline Regret (2022)5.84
- Conservative Optimistic Policy Optimization Via Multiple Importance Sampling (2021)0.00
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00
- A Policy-guided Imitation Approach For Offline Reinforcement Learning (2022)0.00
- Beyond Variance Reduction: Understanding The True Impact Of Baselines On Policy Optimization (2020)0.00
- Curriculum Offline Imitation Learning (2021)0.00
- Dual RL: Unification And New Methods For Reinforcement And Imitation Learning (2023)0.00
- Offline Imitation Learning By Controlling The Effective Planning Horizon (2024)0.00