Meta-trained Agents Implement Bayes-optimal Agents
2020 · Vladimir Mikulik, Grégoire Delétang, Tom McGrath, et al.
Abstract
Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents - that is, even for task distributions for which we currently don't possess tractable models.
Authors
(none)
Tags
Stats
Related papers
- Deep Interactive Bayesian Reinforcement Learning Via Meta-learning (2021)5.24
- Metacontrol For Adaptive Imagination-based Optimization (2017)0.00
- Dynamic Memory For Interpretable Sequential Optimisation (2022)0.00
- Double Meta-learning For Data Efficient Policy Optimization In Non-stationary Environments (2020)0.00
- Meta Reinforcement Learning With Finite Training Tasks -- A Density Estimation Approach (2022)0.00
- Varibad: A Very Good Method For Bayes-adaptive Deep RL Via Meta-learning (2019)0.00
- On The Effectiveness Of Fine-tuning Versus Meta-reinforcement Learning (2022)0.00
- Efficient Meta Reinforcement Learning For Preference-based Fast Adaptation (2022)0.00