Gap-increasing Policy Evaluation For Efficient And Noise-tolerant Reinforcement Learning
2019 Β· Tadashi Kozuno, Dongqi Han, Kenji Doya
Abstract
In real-world applications of reinforcement learning (RL), noise from inherent stochasticity of environments is inevitable. However, current policy evaluation algorithms, which plays a key role in many RL algorithms, are either prone to noise or inefficient. To solve this issue, we introduce a novel policy evaluation algorithm, which we call Gap-increasing RetrAce Policy Evaluation (GRAPE). It leverages two recent ideas: (1) gap-increasing value update operators in advantage learning for noise-tolerance and (2) off-policy eligibility trace in Retrace algorithm for efficient learning. We provide detailed theoretical analysis of the new algorithm that shows its efficiency and noise-tolerance inherited from Retrace and advantage learning. Furthermore, our analysis shows that GRAPE's learning is significantly efficient than that of a simple learning-rate-based approach while keeping the same level of noise-tolerance. We applied GRAPE to control problems and obtained experimental results su
Authors
(none)
Tags
Stats
Related papers
- Improving Policy Gradient By Exploring Under-appreciated Rewards (2016)0.00
- Beyond Expected Return: Accounting For Policy Reproducibility When Evaluating Reinforcement Learning Algorithms (2023)3.58
- Noise-corrected GRPO: From Noisy Rewards To Unbiased Gradients (2025)0.00
- Evaluation-aware Reinforcement Learning (2025)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Action Noise In Off-policy Deep Reinforcement Learning: Impact On Exploration And Performance (2022)0.00
- Non-uniform Noise-to-signal Ratio In The REINFORCE Policy-gradient Estimator (2026)0.00
- Accelerating Residual Reinforcement Learning With Uncertainty Estimation (2025)0.00