Evaluating Feature Dependent Noise In Preference-based Reinforcement Learning
2026 Β· Yuxuan Li, Harshith Reddy Kethireddy, Srijita Das
Abstract
Learning from Preferences in Reinforcement Learning (PbRL) has gained attention recently, as it serves as a natural fit for complicated tasks where the reward function is not easily available. However, preferences often come with uncertainty and noise if they are not from perfect teachers. Much prior literature aimed to detect noise, but with limited types of noise and most being uniformly distributed with no connection to observations. In this work, we formalize the notion of targeted feature-dependent noise and propose several variants like trajectory feature noise, trajectory similarity noise, margin dependent noise, and Language Model noise. We evaluate feature-dependent noise, where noise is correlated with certain features in complex continuous control tasks from DMControl and Meta-world. Our experiments show that in some feature-dependent noise settings, the state-of-the-art noise-robust PbRL method's learning performance is significantly deteriorated, while PbRL method with no
Authors
(none)
Tags
Stats
Related papers
- Boosting Robustness In Preference-based Reinforcement Learning With Dynamic Sparsity (2024)0.00
- Dueling RL: Reinforcement Learning With Trajectory Preferences (2021)0.00
- Ra-pbrl: Provably Efficient Risk-aware Preference-based Reinforcement Learning (2024)0.00
- Data Driven Reward Initialization For Preference Based Reinforcement Learning (2023)0.00
- Hindsight Priors For Reward Learning From Human Preferences (2024)0.00
- Listwise Reward Estimation For Offline Preference-based Reinforcement Learning (2024)0.00
- Tell Me Why: Training Preferences-based RL With Human Preferences And Step-level Explanations (2024)0.00
- Symbol Guided Hindsight Priors For Reward Learning From Human Preferences (2022)0.00