One Solution Is Not All You Need: Few-shot Extrapolation Via Structured Maxent RL
2020 Β· Saurabh Kumar, Aviral Kumar, Sergey Levine, et al.
Abstract
While reinforcement learning algorithms can learn effective policies for complex tasks, these policies are often brittle to even minor task variations, especially when variations are not explicitly provided during training. One natural approach to this problem is to train agents with manually specified variation in the training task or environment. However, this may be infeasible in practical situations, either because making perturbations is not possible, or because it is unclear how to choose suitable perturbation strategies without sacrificing performance. The key insight of this work is that learning diverse behaviors for accomplishing a task can directly lead to behavior that generalizes to varying environments, without needing to perform explicit perturbations during training. By identifying multiple solutions for the task in a single environment during training, our approach can generalize to new situations by abandoning solutions that are no longer effective and adopting those
Authors
(none)
Tags
Stats
Related papers
- Diversity For Contingency: Learning Diverse Behaviors For Efficient Adaptation And Transfer (2023)0.00
- Model-agnostic Solutions For Deep Reinforcement Learning In Non-ergodic Contexts (2026)0.00
- Dense And Diverse Goal Coverage In Multi Goal Reinforcement Learning (2025)0.00
- Discovering Multiple Solutions From A Single Task In Offline Reinforcement Learning (2024)0.00
- Emergent Complexity And Zero-shot Transfer Via Unsupervised Environment Design (2020)0.00
- Open-ended Diverse Solution Discovery With Regulated Behavior Patterns For Cross-domain Adaptation (2022)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Post-convergence Sim-to-real Policy Transfer: A Principled Alternative To Cherry-picking (2025)0.00