Doubly Robust Off-policy Actor-critic Algorithms For Reinforcement Learning
2019 Β· Riashat Islam, Raihan Seraj, Samin Yeasar Arnob, et al.
Abstract
We study the problem of off-policy critic evaluation in several variants of value-based off-policy actor-critic algorithms. Off-policy actor-critic algorithms require an off-policy critic evaluation step, to estimate the value of the new policy after every policy gradient update. Despite enormous success of off-policy policy gradients on control tasks, existing general methods suffer from high variance and instability, partly because the policy improvement depends on gradient of the estimated value function. In this work, we present a new way of off-policy policy evaluation in actor-critic, based on the doubly robust estimators. We extend the doubly robust estimator from off-policy policy evaluation (OPE) to actor-critic algorithms that consist of a reward estimator performance model. We find that doubly robust estimation of the critic can significantly improve performance in continuous control tasks. Furthermore, in cases where the reward function is stochastic that can lead to high v
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Model-free Reinforcement Learning With Off-policy Critics (2019)9.60
- How To Learn A Useful Critic? Model-based Action-gradient-estimator Policy Optimization (2020)0.00
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Revisiting Stochastic Off-policy Action-value Gradients (2017)0.00
- Doubly Robust Off-policy Value And Gradient Estimation For Deterministic Policies (2020)0.00
- A Multi-agent Off-policy Actor-critic Algorithm For Distributed Reinforcement Learning (2019)11.39
- More Efficient Off-policy Evaluation Through Regularized Targeted Learning (2019)0.00
- Actor-critic Policy Optimization In Partially Observable Multiagent Environments (2018)0.00