Offline Meta Learning Of Exploration
2020 Β· Ron Dorfman, Idan Shenfeld, Aviv Tamar
Abstract
Consider the following instance of the Offline Meta Reinforcement Learning (OMRL) problem: given the complete training logs of \(N\) conventional RL agents, trained on \(N\) different tasks, design a meta-agent that can quickly maximize reward in a new, unseen task from the same task distribution. In particular, while each conventional RL agent explored and exploited its own different task, the meta-agent must identify regularities in the data that lead to effective exploration/exploitation in the unseen task. Here, we take a Bayesian RL (BRL) view, and seek to learn a Bayes-optimal policy from the offline data. Building on the recent VariBAD BRL approach, we develop an off-policy BRL method that learns to plan an exploration strategy based on an adaptive neural belief estimate. However, learning to infer such a belief from offline data brings a new identifiability issue we term MDP ambiguity. We characterize the problem, and suggest resolutions via data collection and modification pro
Authors
(none)
Tags
Stats
Related papers
- Offline Meta-reinforcement Learning With Online Self-supervision (2021)0.00
- Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-based Offline Meta Reinforcement Learning (2024)0.00
- FOCAL: Efficient Fully-offline Meta-reinforcement Learning Via Distance Metric Learning And Behavior Regularization (2020)0.00
- Decoupling Exploration And Exploitation For Meta-reinforcement Learning Without Sacrifices (2020)0.00
- First-explore, Then Exploit: Meta-learning To Solve Hard Exploration-exploitation Trade-offs (2023)0.00
- Learning Off-policy With Model-based Intrinsic Motivation For Active Online Exploration (2024)0.00
- Conservative Bayesian Model-based Value Expansion For Offline Policy Optimization (2022)0.00
- Offline Retraining For Online RL: Decoupled Policy Learning To Mitigate Exploration Bias (2023)2.56