Active Policy Improvement From Multiple Black-box Oracles
2023 Β· Xuefeng Liu, Takuma Yoneda, Chaoqi Wang, et al.
Abstract
Reinforcement learning (RL) has made significant strides in various complex domains. However, identifying an effective policy via RL often necessitates extensive exploration. Imitation learning aims to mitigate this issue by using expert demonstrations to guide exploration. In real-world scenarios, one often has access to multiple suboptimal black-box experts, rather than a single optimal oracle. These experts do not universally outperform each other across all states, presenting a challenge in actively deciding which oracle to use and in which state. We introduce MAPS and MAPS-SE, a class of policy improvement algorithms that perform imitation learning from multiple suboptimal oracles. In particular, MAPS actively selects which of the oracles to imitate and improve their value function estimates, and MAPS-SE additionally leverages an active state exploration criterion to determine which states one should explore. We provide a comprehensive theoretical analysis and demonstrate that MAP
Authors
(none)
Tags
Stats
Related papers
- Policy Improvement Via Imitation Of Multiple Oracles (2020)0.00
- Blending Imitation And Reinforcement Learning For Robust Policy Improvement (2023)0.00
- Model-based Reinforcement Learning With Double Oracle Efficiency In Policy Optimization And Offline Estimation (2026)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Think Outside The Policy: In-context Steered Policy Optimization (2025)0.00
- Some Supervision Required: Incorporating Oracle Policies In Reinforcement Learning Via Epistemic Uncertainty Metrics (2022)0.00
- Online Matching Via Reinforcement Learning: An Expert Policy Orchestration Strategy (2025)0.00
- Learning Self-imitating Diverse Policies (2018)0.00