Multi-objective Model-based Policy Search For Data-efficient Learning With Sparse Rewards
2018 Β· Rituraj Kaushik, Konstantinos Chatzilygeroudis, Jean-Baptiste Mouret
Abstract
The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. However, the current algorithms lack an effective exploration strategy to deal with sparse or misleading reward scenarios: if they do not experience any state with a positive reward during the initial random exploration, it is very unlikely to solve the problem. Here, we propose a novel model-based policy search algorithm, Multi-DEX, that leverages a learned dynamical model to efficiently explore the task space and solve tasks with sparse rewards in a few episodes. To achieve this, we frame the policy search problem as a multi-objective, model-based policy optimization problem with three objectives: (1) generate maximally novel state trajectories, (2) maximize the expected return and (3) keep the system in state-spa
Authors
(none)
Tags
Stats
Related papers
- DQN With Model-based Exploration: Efficient Learning On Environments With Sparse Rewards (2019)0.00
- A Study On Dense And Sparse (visual) Rewards In Robot Policy Learning (2021)0.00
- Learning Self-imitating Diverse Policies (2018)0.00
- Efficient Model-based Reinforcement Learning Through Optimistic Policy Search And Planning (2020)0.00
- Discovering And Exploiting Sparse Rewards In A Learned Behavior Space (2021)0.00
- Dynamic Subgoal-based Exploration Via Bayesian Optimization (2019)0.00
- REMAX: Relational Representation For Multi-agent Exploration (2020)2.26
- Coordinated Exploration Via Intrinsic Rewards For Multi-agent Reinforcement Learning (2019)0.00