Varibad: A Very Good Method For Bayes-adaptive Deep RL Via Meta-learning
2019 Β· Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, et al.
Abstract
Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent's uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncertainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods.
Authors
(none)
Tags
Stats
Related papers
- Varibased: Variational Bayes-adaptive Sequential Monte-carlo Planning For Deep Reinforcement Learning (2026)0.00
- Deep Interactive Bayesian Reinforcement Learning Via Meta-learning (2021)5.24
- Offline Meta Learning Of Exploration (2020)0.00
- Efficient Off-policy Meta-reinforcement Learning Via Probabilistic Context Variables (2019)0.00
- VIME: Variational Information Maximizing Exploration (2016)0.00
- Meta-reinforcement Learning With Universal Policy Adaptation: Provable Near-optimality Under All-task Optimum Comparator (2024)0.00
- First-explore, Then Exploit: Meta-learning To Solve Hard Exploration-exploitation Trade-offs (2023)0.00
- Bayesian Exploration Networks (2023)0.00