Active Teacher Selection For Reinforcement Learning From Human Feedback
2023 Β· Rachel Freedman, Justin Svegliato, Kyle Wray, et al.
Abstract
Reinforcement learning from human feedback (RLHF) enables machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite querying a range of distinct teachers. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing. We find that the Active Teacher Selection (ATS) algorithm outperforms baseline algorithms by actively selecting when and which teacher to query. The HUB framework and ATS algorithm demonstrate the importance of leveraging differences between teachers to learn accurate reward models, facilitating future research on active teacher selection for robust reward modeling.
Authors
(none)
Tags
Stats
Related papers
- Provably Feedback-efficient Reinforcement Learning Via Active Reward Learning (2023)0.00
- TGRL: An Algorithm For Teacher Guided Reinforcement Learning (2023)0.00
- A Survey Of Reinforcement Learning From Human Feedback (2023)0.00
- Mapping Out The Space Of Human Feedback For Reinforcement Learning: A Conceptual Framework (2024)0.00
- Human AI Interaction Loop Training: New Approach For Interactive Reinforcement Learning (2020)0.00
- Humans Are Not Boltzmann Distributions: Challenges And Opportunities For Modelling Human Feedback And Interaction In Reinforcement Learning (2022)0.00
- Aligning Humans And Robots Via Reinforcement Learning From Implicit Human Feedback (2025)2.26
- Improving Multimodal Interactive Agents With Reinforcement Learning From Human Feedback (2022)0.00