Bandit-based Policy Invariant Explicit Shaping For Incorporating External Advice In Reinforcement Learning
2023 Β· Yash Satsangi, Paniz Behboudian
Abstract
A key challenge for a reinforcement learning (RL) agent is to incorporate external/expert1 advice in its learning. The desired goals of an algorithm that can shape the learning of an RL agent with external advice include (a) maintaining policy invariance; (b) accelerating the learning of the agent; and (c) learning from arbitrary advice [3]. To address this challenge this paper formulates the problem of incorporating external advice in RL as a multi-armed bandit called shaping-bandits. The reward of each arm of shaping bandits corresponds to the return obtained by following the expert or by following a default RL algorithm learning on the true environment reward.We show that directly applying existing bandit and shaping algorithms that do not reason about the non-stationary nature of the underlying returns can lead to poor results. Thus we propose UCB-PIES (UPIES), Racing-PIES (RPIES), and Lazy PIES (LPIES) three different shaping algorithms built on different assumptions that reason a
Authors
(none)
Tags
Stats
Related papers
- Shaping Advice In Deep Reinforcement Learning (2022)0.00
- BAMDP Shaping: A Unified Framework For Intrinsic Motivation And Reward Shaping (2024)0.00
- Influencing Reinforcement Learning Through Natural Language Guidance (2021)0.00
- Environment Shaping In Reinforcement Learning Using State Abstraction (2020)0.00
- Learning Shaping Strategies In Human-in-the-loop Interactive Reinforcement Learning (2018)0.00
- Learning To Shape Rewards Using A Game Of Two Partners (2021)0.00
- Constrained Policy Improvement For Safe And Efficient Reinforcement Learning (2018)0.00
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00