Ra-pbrl: Provably Efficient Risk-aware Preference-based Reinforcement Learning
2024 Β· Yujie Zhao, Jose Efraim Aguilar Escamill, Weyl Lu, et al.
Abstract
Reinforcement Learning from Human Feedback (RLHF) has recently surged in popularity, particularly for aligning large language models and other AI systems with human intentions. At its core, RLHF can be viewed as a specialized instance of Preference-based Reinforcement Learning (PbRL), where the preferences specifically originate from human judgments rather than arbitrary evaluators. Despite this connection, most existing approaches in both RLHF and PbRL primarily focus on optimizing a mean reward objective, neglecting scenarios that necessitate risk-awareness, such as AI safety, healthcare, and autonomous driving. These scenarios often operate under a one-episode-reward setting, which makes conventional risk-sensitive objectives inapplicable. To address this, we explore and prove the applicability of two risk-aware objectives to PbRL : nested and static quantile risk objectives. We also introduce Risk-AwarePbRL (RA-PbRL), an algorithm designed to optimize both nested and static objecti
Authors
(none)
Tags
Stats
Related papers
- Epistemic Risk-sensitive Reinforcement Learning (2019)0.00
- Hindsight Priors For Reward Learning From Human Preferences (2024)0.00
- Symbol Guided Hindsight Priors For Reward Learning From Human Preferences (2022)0.00
- Boosting Robustness In Preference-based Reinforcement Learning With Dynamic Sparsity (2024)0.00
- A Survey Of Reinforcement Learning From Human Feedback (2023)0.00
- Data Driven Reward Initialization For Preference Based Reinforcement Learning (2023)0.00
- Query-policy Misalignment In Preference-based Reinforcement Learning (2023)0.00
- Provably Efficient Iterated Cvar Reinforcement Learning With Function Approximation And Human Feedback (2023)0.00