Mixed Policy Gradient: Off-policy Reinforcement Learning Driven Jointly By Data And Model
2021 Β· Yang Guan, Jingliang Duan, Shengbo Eben Li, et al.
Abstract
Reinforcement learning (RL) shows great potential in sequential decision-making. At present, mainstream RL algorithms are data-driven, which usually yield better asymptotic performance but much slower convergence compared with model-driven methods. This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical data and the transition model in policy gradient (PG) to accelerate convergence without performance degradation. Formally, MPG is constructed as a weighted average of the data-driven and model-driven PGs, where the former is the derivative of the learned Q-value function, and the latter is that of the model-predictive return. To guide the weight design, we analyze and compare the upper bound of each PG error. Relying on that, a rule-based method is employed to heuristically adjust the weights. In particular, to get a better PG, the weight of the data-driven PG is designed to grow along the learning process while the other to decrease. Simulation results show
Authors
(none)
Tags
Stats
Related papers
- Merging Deterministic Policy Gradient Estimations With Varied Bias-variance Tradeoff For Effective Deep Reinforcement Learning (2019)0.00
- Interpolated Policy Gradient: Merging On-policy And Off-policy Gradient Estimation For Deep Reinforcement Learning (2017)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- Policy Gradient Algorithms With Monte Carlo Tree Learning For Non-markov Decision Processes (2022)0.00
- Combining Policy Gradient And Q-learning (2016)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- MEPG: A Minimalist Ensemble Policy Gradient Framework For Deep Reinforcement Learning (2021)0.00