TGRL: An Algorithm For Teacher Guided Reinforcement Learning
2023 Β· Idan Shenfeld, Zhang-Wei Hong, Aviv Tamar, et al.
Abstract
Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a \(\textit\{principled\}\) approach, along with an approximate implementation for \(\textit\{dynamically\}\) and \(\textit\{automatically\}\) balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervisi
Authors
(none)
Tags
Stats
Related papers
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Improving Interactive Reinforcement Learning: What Makes A Good Teacher? (2019)11.19
- Provably Feedback-efficient Reinforcement Learning Via Active Reward Learning (2023)0.00
- Active Teacher Selection For Reinforcement Learning From Human Feedback (2023)0.00
- Replacing Rewards With Examples: Example-based Policy Search Via Recursive Classification (2021)0.00
- Discovering Reinforcement Learning Algorithms (2020)0.00
- Knowledge Transfer From Teachers To Learners In Growing-batch Reinforcement Learning (2023)0.00
- Reinforcement Learning By Guided Safe Exploration (2023)5.24