Characterizing Policy Divergence For Personalized Meta-reinforcement Learning
2020 Β· Michael Zhang
Abstract
Despite ample motivation from costly exploration and limited trajectory data, rapidly adapting to new environments with few-shot reinforcement learning (RL) can remain a challenging task, especially with respect to personalized settings. Here, we consider the problem of recommending optimal policies to a set of multiple entities each with potentially different characteristics, such that individual entities may parameterize distinct environments with unique transition dynamics. Inspired by existing literature in meta-learning, we extend previous work by focusing on the notion that certain environments are more similar to each other than others in personalized settings, and propose a model-free meta-learning algorithm that prioritizes past experiences by relevance during gradient-based adaptation. Our algorithm involves characterizing past policy divergence through methods in inverse reinforcement learning, and we illustrate how such metrics are able to effectively distinguish past polic
Authors
(none)
Tags
Stats
Related papers
- Learning Self-imitating Diverse Policies (2018)0.00
- Efficient Meta Reinforcement Learning For Preference-based Fast Adaptation (2022)0.00
- Distributionally Adaptive Meta Reinforcement Learning (2022)2.26
- Diversity-inducing Policy Gradient: Using Maximum Mean Discrepancy To Find A Set Of Diverse Policies (2019)8.35
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Guided Meta-policy Search (2019)0.00
- Open-ended Diverse Solution Discovery With Regulated Behavior Patterns For Cross-domain Adaptation (2022)0.00
- Learning To Explore With Meta-policy Gradient (2018)0.00