Reinforcement Learning From Diverse Human Preferences
2023 Β· Wanqi Xue, Bo An, Shuicheng Yan, et al.
Abstract
The complexity of designing reward functions has been a major obstacle to the wide application of deep reinforcement learning (RL) techniques. Describing an agent's desired behaviors and properties can be difficult, even for experts. A new paradigm called reinforcement learning from human preferences (or preference-based RL) has emerged as a promising solution, in which reward functions are learned from human preference labels among behavior trajectories. However, existing methods for preference-based RL are limited by the need for accurate oracle preference labels. This paper addresses this limitation by developing a method for crowd-sourcing preference labels and learning from diverse human preferences. The key idea is to stabilize reward learning through regularization and correction in a latent space. To ensure temporal consistency, a strong constraint is imposed on the reward model that forces its latent space to be close to the prior distribution. Additionally, a confidence-based
Authors
(none)
Tags
Stats
Related papers
- Batch Reinforcement Learning From Crowds (2021)0.00
- Deep Reinforcement Learning From Hierarchical Preference Design (2023)2.00
- A General Theoretical Paradigm To Understand Learning From Human Preferences (2023)0.00
- Symbol Guided Hindsight Priors For Reward Learning From Human Preferences (2022)0.00
- REBEL: Reward Regularization-based Approach For Robotic Reinforcement Learning From Human Feedback (2023)0.00
- Hindsight Priors For Reward Learning From Human Preferences (2024)0.00
- Tell Me Why: Training Preferences-based RL With Human Preferences And Step-level Explanations (2024)0.00
- Online Iterative Reinforcement Learning From Human Feedback With General Preference Model (2024)0.00