Using Monte Carlo Tree Search As A Demonstrator Within Asynchronous Deep RL
2018 Β· Bilal Kartal, Pablo Hernandez-Leal, Matthew E. Taylor
Abstract
Deep reinforcement learning (DRL) has achieved great successes in recent years with the help of novel methods and higher compute power. However, there are still several challenges to be addressed such as convergence to locally optimal policies and long training times. In this paper, firstly, we augment Asynchronous Advantage Actor-Critic (A3C) method with a novel self-supervised auxiliary task, i.e. *Terminal Prediction*, measuring temporal closeness to terminal states, namely A3C-TP. Secondly, we propose a new framework where planning algorithms such as Monte Carlo tree search or other sources of (simulated) demonstrators can be integrated to asynchronous distributed DRL methods. Compared to vanilla A3C, our proposed methods both learn faster and converge to better policies on a two-player mini version of the Pommerman game.
Authors
(none)
Tags
Stats
Related papers
- Monte Carlo Augmented Actor-critic For Sparse Reward Deep Reinforcement Learning From Suboptimal Demonstrations (2022)0.00
- A3C-S: Automated Agent Accelerator Co-search Towards Efficient Deep Reinforcement Learning (2021)0.00
- A Human Mixed Strategy Approach To Deep Reinforcement Learning (2018)7.50
- Toward Interpretable Deep Reinforcement Learning With Linear Model U-trees (2018)13.05
- Decision Making In Non-stationary Environments With Policy-augmented Monte Carlo Tree Search (2022)0.00
- Efficient Exploration In Deep Reinforcement Learning: A Novel Bayesian Actor-critic Algorithm (2024)0.00
- Bayes Adaptive Monte Carlo Tree Search For Offline Model-based Reinforcement Learning (2024)0.00
- Mitigating Estimation Errors By Twin Td-regularized Actor And Critic For Deep Reinforcement Learning (2023)0.00