Hyperbolically-discounted Reinforcement Learning On Reward-punishment Framework
2021 Β· Taisuke Kobayashi
Abstract
This paper proposes a new reinforcement learning with hyperbolic discounting. Combining a new temporal difference error with the hyperbolic discounting in recursive manner and reward-punishment framework, a new scheme to learn the optimal policy is derived. In simulations, it is found that the proposal outperforms the standard reinforcement learning, although the performance depends on the design of reward and punishment. In addition, the averages of discount factors w.r.t. reward and punishment are different from each other, like a sign effect in animal behaviors.
Authors
(none)
Tags
Stats
Related papers
- Self Punishment And Reward Backfill For Deep Q-learning (2020)7.16
- Reward Tweaking: Maximizing The Total Reward While Planning For Short Horizons (2020)0.00
- Examining Average And Discounted Reward Optimality Criteria In Reinforcement Learning (2021)0.00
- Average Reward Adjusted Discounted Reinforcement Learning: Near-blackwell-optimal Policies For Real-world Applications (2020)0.00
- Learning Fair Policies In Multiobjective (deep) Reinforcement Learning With Average And Discounted Rewards (2020)0.00
- Delayed Geometric Discounts: An Alternative Criterion For Reinforcement Learning (2022)0.00
- Reward-conditioned Policies (2019)0.00
- Analyzing And Bridging The Gap Between Maximizing Total Reward And Discounted Reward In Deep Reinforcement Learning (2024)0.00