SIBRE: Self Improvement Based Rewards For Adaptive Feedback In Reinforcement Learning
2020 Β· Somjit Nath, Richa Verma, Abhik Ray, et al.
Abstract
We propose a generic reward shaping approach for improving the rate of convergence in reinforcement learning (RL), called Self Improvement Based REwards, or SIBRE. The approach is designed for use in conjunction with any existing RL algorithm, and consists of rewarding improvement over the agent's own past performance. We prove that SIBRE converges in expectation under the same conditions as the original RL algorithm. The reshaped rewards help discriminate between policies when the original rewards are weakly discriminated or sparse. Experiments on several well-known benchmark environments with different RL algorithms show that SIBRE converges to the optimal policy faster and more stably. We also perform sensitivity analysis with respect to hyper-parameters, in comparison with baseline RL algorithms.
Authors
(none)
Tags
Stats
Related papers
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00
- Automatic Intrinsic Reward Shaping For Exploration In Deep Reinforcement Learning (2023)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Iterative Reward Shaping Using Human Feedback For Correcting Reward Misspecification (2023)4.52
- ORSO: Accelerating Reward Design Via Online Reward Selection And Policy Optimization (2024)0.00
- Policy Improvement Reinforcement Learning (2026)0.00
- Reward Shaping For Human Learning Via Inverse Reinforcement Learning (2020)0.00
- S-REINFORCE: A Neuro-symbolic Policy Gradient Approach For Interpretable Reinforcement Learning (2023)0.00