Unsupervised Learning And Exploration Of Reachable Outcome Space
2019 · Giuseppe Paolo, Alban Laflaquière, Alexandre Coninx, et al.
Abstract
Performing Reinforcement Learning in sparse rewards settings, with very little prior knowledge, is a challenging problem since there is no signal to properly guide the learning process. In such situations, a good search strategy is fundamental. At the same time, not having to adapt the algorithm to every single problem is very desirable. Here we introduce TAXONS, a Task Agnostic eXploration of Outcome spaces through Novelty and Surprise algorithm. Based on a population-based divergent-search approach, it learns a set of diverse policies directly from high-dimensional observations, without any task-specific information. TAXONS builds a repertoire of policies while training an autoencoder on the high-dimensional observation of the final state of the system to build a low-dimensional outcome space. The learned outcome space, combined with the reconstruction error, is used to drive the search for new policies. Results show that TAXONS can find a diverse set of controllers, covering a good
Authors
(none)
Tags
Stats
Related papers
- Learning In Sparse Rewards Settings Through Quality-diversity Algorithms (2022)0.00
- Discovering And Exploiting Sparse Rewards In A Learned Behavior Space (2021)0.00
- Multi-objective Model-based Policy Search For Data-efficient Learning With Sparse Rewards (2018)0.00
- Dynamic Subgoal-based Exploration Via Bayesian Optimization (2019)0.00
- Selection-expansion: A Unifying Framework For Motion-planning And Diversity Search Algorithms (2021)0.00
- PNS: Population-guided Novelty Search For Reinforcement Learning In Hard Exploration Environments (2018)7.16
- Long-term Visitation Value For Deep Exploration In Sparse Reward Reinforcement Learning (2020)7.24
- Never Give Up: Learning Directed Exploration Strategies (2020)0.00