Residual Q-learning: Offline And Online Policy Customization Without Value
2023 Β· Chenran Li, Chen Tang, Haruki Nishimura, et al.
Abstract
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations. It is especially appealing for solving complex real-world tasks where handcrafting reward function is difficult, or when the goal is to mimic human expert behavior. However, the learned imitative policy can only follow the behavior in the demonstration. When applying the imitative policy, we may need to customize the policy behavior to meet different requirements coming from diverse downstream tasks. Meanwhile, we still want the customized policy to maintain its imitative nature. To this end, we formulate a new problem setting called policy customization. It defines the learning task as training a policy that inherits the characteristics of the prior policy while satisfying some additional requirements imposed by a target downstream task. We propose a novel and principled approach to interpret and determine the trade-off between the two task objectives. Specifically, we formulate the
Authors
(none)
Tags
Stats
Related papers
- Curriculum Offline Imitation Learning (2021)0.00
- Explaining Fast Improvement In Online Imitation Learning (2020)0.00
- Efficient Offline Reinforcement Learning: First Imitate, Then Improve (2024)1.91
- A Policy-guided Imitation Approach For Offline Reinforcement Learning (2022)0.00
- Mitigating Covariate Shift In Imitation Learning Via Offline Data Without Great Coverage (2021)0.00
- Online Adaptation For Enhancing Imitation Learning Policies (2024)0.00
- Policy Agnostic RL: Offline RL And Online RL Fine-tuning Of Any Class And Backbone (2024)0.00
- Offline Imitation Learning By Controlling The Effective Planning Horizon (2024)0.00