Relative Importance Sampling For Off-policy Actor-critic In Deep Reinforcement Learning
2018 Β· Mahammad Humayoo, Gengzhong Zheng, Xiaoqing Dong, et al.
Abstract
Off-policy learning exhibits greater instability when compared to on-policy learning in reinforcement learning (RL). The difference in probability distribution between the target policy (\(\pi\)) and the behavior policy (b) is a major cause of instability. High variance also originates from distributional mismatch. The variation between the target policy's distribution and the behavior policy's distribution can be reduced using importance sampling (IS). However, importance sampling has high variance, which is exacerbated in sequential scenarios. We propose a smooth form of importance sampling, specifically relative importance sampling (RIS), which mitigates variance and stabilizes learning. To control variance, we alter the value of the smoothness parameter \(\beta\in[0, 1]\) in RIS. We develop the first model-free relative importance sampling off-policy actor-critic (RIS-off-PAC) algorithms in RL using this strategy. Our method uses a network to generate the target policy (actor) and
Authors
(none)
Tags
Stats
Related papers
- Low Variance Off-policy Evaluation With State-based Importance Sampling (2022)0.00
- Conditional Importance Sampling For Off-policy Learning (2019)0.00
- Towards Optimal Off-policy Evaluation For Reinforcement Learning With Marginalized Importance Sampling (2019)0.00
- On The Reuse Bias In Off-policy Reinforcement Learning (2022)3.58
- Mitigating Off-policy Bias In Actor-critic Methods With One-step Q-learning: A Novel Correction Approach (2022)0.00
- Sample Dropout: A Simple Yet Effective Variance Reduction Technique In Deep Policy Optimization (2023)0.00
- Robust On-policy Sampling For Data-efficient Policy Evaluation In Reinforcement Learning (2021)0.00
- Sample-efficient Model-free Reinforcement Learning With Off-policy Critics (2019)9.60