Lever: Inference-time Policy Reuse Under Support Constraints
2026 Β· Ihor Vitenko, Noha Ibrahim, Sihem Amer-Yahia
Abstract
arXiv:2604.20174v2 Announce Type: replace Abstract: Reinforcement learning (RL) policies are typically trained for fixed objectives, making reuse difficult when task requirements change. We study inference-time policy reuse: given a library of pre-trained policies and a new composite objective, can a high-quality policy be constructed entirely offline, without additional environment interaction? We introduce lever (Leveraging Efficient Vector Embeddings for Reusable policies), an end-to-end framework that retrieves relevant policies, evaluates them using behavioral embeddings, and composes new policies via offline Q-value composition. We focus on the support-limited regime, where no value propagation is possible, and show that the effectiveness of reuse depends critically on the coverage of available transitions. To balance performance and computational cost, lever proposes composition strategies that control the exploration of candidate policies. Experiments in deterministic GridWorl
Authors
(none)
Tags
Stats
Related papers
- LISPR: An Options Framework For Policy Reuse With Reinforcement Learning (2020)0.00
- IOB: Integrating Optimization Transfer And Behavior Transfer For Multi-policy Reuse (2023)5.24
- When Policies Cannot Be Retrained: A Unified Closed-form View Of Post-training Steering In Offline Reinforcement Learning (2026)0.00
- Policy Improvement Reinforcement Learning (2026)0.00
- Think Outside The Policy: In-context Steered Policy Optimization (2025)0.00
- Context-aware Policy Reuse (2018)0.00
- A Non-monolithic Policy Approach Of Offline-to-online Reinforcement Learning (2024)0.00
- Polychromic Objectives For Reinforcement Learning (2026)0.00