Listwise Reward Estimation For Offline Preference-based Reinforcement Learning
2024 Β· Heewoong Choi, Sangwon Jung, Hongjoon Ahn, et al.
Abstract
In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. However, existing PbRL methods have limitations as they often overlook the second-order preference that indicates the relative strength of preference. In this paper, we propose Listwise Reward Estimation (LiRE), a novel approach for offline PbRL that leverages second-order preference information by constructing a Ranked List of Trajectories (RLT), which can be efficiently built by using the same ternary feedback type as traditional methods. To validate the effectiveness of LiRE, we propose a new offline PbRL dataset that objectively reflects the effect of the estimated rewards. Our extensive experiments on the dataset demonstrate the superiority of LiRE, i.e., outperforming state-of-the-art baselines even with modest feedback budgets and
Authors
(none)
Tags
Stats
Related papers
- Hindsight Priors For Reward Learning From Human Preferences (2024)0.00
- Symbol Guided Hindsight Priors For Reward Learning From Human Preferences (2022)0.00
- Provably Efficient Offline Reinforcement Learning With Trajectory-wise Reward (2022)0.00
- Efficient Preference-based Reinforcement Learning Via Aligned Experience Estimation (2024)0.00
- In-dataset Trajectory Return Regularization For Offline Preference-based Reinforcement Learning (2024)0.00
- Data Driven Reward Initialization For Preference Based Reinforcement Learning (2023)0.00
- Ra-pbrl: Provably Efficient Risk-aware Preference-based Reinforcement Learning (2024)0.00
- Model-based Offline Reinforcement Learning With Lower Expectile Q-learning (2024)0.00