Meta-q-learning
2019 Β· Rasool Fakoor, Pratik Chaudhari, Stefano Soatto, et al.
Abstract
This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with the state of the art in meta-RL.
Authors
(none)
Tags
Stats
Related papers
- Model-based Adversarial Meta-reinforcement Learning (2020)0.00
- A Tutorial On Meta-reinforcement Learning (2023)10.85
- Meta-reinforcement Learning With Universal Policy Adaptation: Provable Near-optimality Under All-task Optimum Comparator (2024)0.00
- RL\(^3\): Boosting Meta Reinforcement Learning Via RL Inside RL\(^2\) (2023)0.00
- Guided Meta-policy Search (2019)0.00
- Context Meta-reinforcement Learning Via Neuromodulation (2021)6.34
- Meta-model-based Meta-policy Optimization (2020)0.00
- Promp: Proximal Meta-policy Search (2018)0.00