Self Punishment And Reward Backfill For Deep Q-learning
2020 Β· Mohammad Reza Bonyadi, Rui Wang, Maryam Ziaei
Abstract
Reinforcement learning agents learn by encouraging behaviours which maximize their total reward, usually provided by the environment. In many environments, however, the reward is provided after a series of actions rather than each single action, leading the agent to experience ambiguity in terms of whether those actions are effective, an issue known as the credit assignment problem. In this paper, we propose two strategies inspired by behavioural psychology to enable the agent to intrinsically estimate more informative reward values for actions with no reward. The first strategy, called self-punishment (SP), discourages the agent from making mistakes that lead to undesirable terminal states. The second strategy, called the rewards backfill (RB), backpropagates the rewards between two rewarded actions. We prove that, under certain assumptions and regardless of the reinforcement learning algorithm used, these two strategies maintain the order of policies in the space of all possible poli
Authors
(none)
Tags
Stats
Related papers
- Learning Self-imitating Diverse Policies (2018)0.00
- Hyperbolically-discounted Reinforcement Learning On Reward-punishment Framework (2021)0.00
- Deep PQR: Solving Inverse Reinforcement Learning Using Anchor Actions (2020)0.00
- Curious Exploration And Return-based Memory Restoration For Deep Reinforcement Learning (2021)0.00
- Handling Cost And Constraints With Off-policy Deep Reinforcement Learning (2023)0.00
- Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards (2024)0.00
- Learning Long-term Reward Redistribution Via Randomized Return Decomposition (2021)0.00
- Adaptive Symmetric Reward Noising For Reinforcement Learning (2019)0.00