LISPR: An Options Framework For Policy Reuse With Reinforcement Learning
2020 Β· Daniel Graves, Jun Jin, Jun Luo
Abstract
We propose a framework for transferring any existing policy from a potentially unknown source MDP to a target MDP. This framework (1) enables reuse in the target domain of any form of source policy, including classical controllers, heuristic policies, or deep neural network-based policies, (2) attains optimality under suitable theoretical conditions, and (3) guarantees improvement over the source policy in the target MDP. These are achieved by packaging the source policy as a black-box option in the target MDP and providing a theoretically grounded way to learn the option's initiation set through general value functions. Our approach facilitates the learning of new policies by (1) maximizing the target MDP reward with the help of the black-box option, and (2) returning the agent to states in the learned initiation set of the black-box option where it is already optimal. We show that these two variants are equivalent in performance under some conditions. Through a series of experiments
Authors
(none)
Tags
Stats
Related papers
- An Efficient Transfer Learning Framework For Multiagent Reinforcement Learning (2020)0.00
- IOB: Integrating Optimization Transfer And Behavior Transfer For Multi-policy Reuse (2023)5.24
- Lever: Inference-time Policy Reuse Under Support Constraints (2026)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Reinforcement Learning In Pomdps With Memoryless Options And Option-observation Initiation Sets (2017)6.77
- Post-convergence Sim-to-real Policy Transfer: A Principled Alternative To Cherry-picking (2025)0.00
- Context-aware Policy Reuse (2018)0.00
- LESSON: Learning To Integrate Exploration Strategies For Reinforcement Learning Via An Option Framework (2023)0.00