Multi-step Greedy Reinforcement Learning Algorithms
2019 Β· Manan Tomar, Yonathan Efroni, Mohammad Ghavamzadeh
Abstract
Multi-step greedy policies have been extensively used in model-based reinforcement learning (RL), both when a model of the environment is available (e.g.,~in the game of Go) and when it is learned. In this paper, we explore their benefits in model-free RL, when employed using multi-step dynamic programming algorithms: \(\kappa\)-Policy Iteration (\(\kappa\)-PI) and \(\kappa\)-Value Iteration (\(\kappa\)-VI). These methods iteratively compute the next policy (\(\kappa\)-PI) and value function (\(\kappa\)-VI) by solving a surrogate decision problem with a shaped reward and a smaller discount factor. We derive model-free RL algorithms based on \(\kappa\)-PI and \(\kappa\)-VI in which the surrogate problem can be solved by any discrete or continuous action RL method, such as DQN and TRPO. We identify the importance of a hyper-parameter that controls the extent to which the surrogate problem is solved and suggest a way to set this parameter. When evaluated on a range of Atari and MuJoCo ben
Authors
(none)
Tags
Stats
Related papers
- Learning Self-imitating Diverse Policies (2018)0.00
- Conservative Optimistic Policy Optimization Via Multiple Importance Sampling (2021)0.00
- Dual Policy Iteration (2018)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- DDPG++: Striving For Simplicity In Continuous-control Off-policy Reinforcement Learning (2020)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Understanding The Pathologies Of Approximate Policy Evaluation When Combined With Greedification In Reinforcement Learning (2020)0.00