A Survey Of Reinforcement Learning From Human Feedback
2023 Β· Timo Kaufmann, Paul Weng, Viktor Bengs, et al.
Abstract
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning provides a promising approach to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The success in training large language models (LLMs) has impressively demonstrated this potential in recent years, where RLHF has played a decisive role in directing the model's capabilities towards human objectives. This article provides an overview of the fundamentals of RLHF, exploring how RL agents interact with human feedback. While recent focus has been on RLHF for LLMs, our survey covers the technique across multiple domains. We provide
Authors
(none)
Tags
Stats
Related papers
- Mapping Out The Space Of Human Feedback For Reinforcement Learning: A Conceptual Framework (2024)0.00
- Reinforcement Learning In The Era Of Llms: What Is Essential? What Is Needed? An RL Perspective On RLHF, Prompting, And Beyond (2023)0.00
- Aligning Humans And Robots Via Reinforcement Learning From Implicit Human Feedback (2025)2.26
- A Survey On Enhancing Reinforcement Learning In Complex Environments: Insights From Human And LLM Feedback (2024)0.00
- The Alignment Ceiling: Objective Mismatch In Reinforcement Learning From Human Feedback (2023)0.00
- Towards Reinforcement Learning From Neural Feedback: Mapping Fnirs Signals To Agent Performance (2025)0.00
- Perspectives On The Social Impacts Of Reinforcement Learning With Human Feedback (2023)0.00
- Principled Reinforcement Learning With Human Feedback From Pairwise Or \(k\)-wise Comparisons (2023)0.00