Learning To Shape Rewards Using A Game Of Two Partners
2021 Β· David Mguni, Taher Jafferjee, Jianhong Wang, et al.
Abstract
Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that is beneficial to the task thus ensuring efficient convergence to high performance policies. We demons
Authors
(none)
Tags
Stats
Related papers
- ORSO: Accelerating Reward Design Via Online Reward Selection And Policy Optimization (2024)0.00
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00
- BAMDP Shaping: A Unified Framework For Intrinsic Motivation And Reward Shaping (2024)0.00
- Action Guidance: Getting The Best Of Sparse Rewards And Shaped Rewards For Real-time Strategy Games (2020)0.00
- Subgoal-based Reward Shaping To Improve Efficiency In Reinforcement Learning (2021)0.00
- Unpacking Reward Shaping: Understanding The Benefits Of Reward Engineering On Sample Complexity (2022)4.52
- On The Fundamental Limitations Of Decentralized Learnable Reward Shaping In Cooperative Multi-agent Reinforcement Learning (2025)0.00
- Reward Design For Reinforcement Learning Agents (2025)0.00