Accelerating Goal-directed Reinforcement Learning By Model Characterization
2019 Β· Shoubhik Debnath, Gaurav Sukhatme, Lantao Liu
Abstract
We propose a hybrid approach aimed at improving the sample efficiency in goal-directed reinforcement learning. We do this via a two-step mechanism where firstly, we approximate a model from Model-Free reinforcement learning. Then, we leverage this approximate model along with a notion of reachability using Mean First Passage Times to perform Model-Based reinforcement learning. Built on such a novel observation, we design two new algorithms - Mean First Passage Time based Q-Learning (MFPT-Q) and Mean First Passage Time based DYNA (MFPT-DYNA), that have been fundamentally modified from the state-of-the-art reinforcement learning techniques. Preliminary results have shown that our hybrid approaches converge with much fewer iterations than their corresponding state-of-the-art counterparts and therefore requiring much fewer samples and much fewer training trials to converge.
Authors
(none)
Tags
Stats
Related papers
- A Model-based Approach For Sample-efficient Multi-task Reinforcement Learning (2019)0.00
- Accelerating Imitation Learning With Predictive Models (2018)0.00
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- A Hybrid PAC Reinforcement Learning Algorithm (2020)0.00
- Approximating Two Value Functions Instead Of One: Towards Characterizing A New Family Of Deep Reinforcement Learning Algorithms (2019)0.00
- Learn A Flexible Exploration Model For Parameterized Action Markov Decision Processes (2025)0.00
- Model-based Adaptation For Sample Efficient Transfer In Reinforcement Learning Control Of Parameter-varying Systems (2023)2.26
- Using Forwards-backwards Models To Approximate MDP Homomorphisms (2022)0.00