BAMDP Shaping: A Unified Framework For Intrinsic Motivation And Reward Shaping
2024 Β· Aly Lidayan, Michael Dennis, Stuart Russell
Abstract
Intrinsic motivation and reward shaping guide reinforcement learning (RL) agents by adding pseudo-rewards, which can lead to useful emergent behaviors. However, they can also encourage counterproductive exploits, e.g., fixation with noisy TV screens. Here we provide a theoretical model which anticipates these behaviors, and provides broad criteria under which adverse effects can be bounded. We characterize all pseudo-rewards as reward shaping in Bayes-Adaptive Markov Decision Processes (BAMDPs), which formulates the problem of learning in MDPs as an MDP over the agent's knowledge. Optimal exploration maximizes BAMDP state value, which we decompose into the value of the information gathered and the prior value of the physical state. Psuedo-rewards guide RL agents by rewarding behavior that increases these value components, while they hinder exploration when they align poorly with the actual value. We extend potential-based shaping theory to prove BAMDP Potential-based shaping Functions
Authors
(none)
Tags
Stats
Related papers
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00
- Unpacking Reward Shaping: Understanding The Benefits Of Reward Engineering On Sample Complexity (2022)4.52
- Learning To Shape Rewards Using A Game Of Two Partners (2021)0.00
- Shaping Advice In Deep Reinforcement Learning (2022)0.00
- A New Potential-based Reward Shaping For Reinforcement Learning Agent (2019)0.00
- Subgoal-based Reward Shaping To Improve Efficiency In Reinforcement Learning (2021)0.00
- Environment Shaping In Reinforcement Learning Using State Abstraction (2020)0.00
- Automatic Intrinsic Reward Shaping For Exploration In Deep Reinforcement Learning (2023)0.00