REBEL: Reward Regularization-based Approach For Robotic Reinforcement Learning From Human Feedback
2023 Β· Souradip Chakraborty, Anukriti Singh, Amisha Bhaskar, et al.
Abstract
The effectiveness of reinforcement learning (RL) agents in continuous control robotics tasks is mainly dependent on the design of the underlying reward function, which is highly prone to reward hacking. A misalignment between the reward function and underlying human preferences (values, social norms) can lead to catastrophic outcomes in the real world especially in the context of robotics for critical decision making. Recent methods aim to mitigate misalignment by learning reward functions from human preferences and subsequently performing policy optimization. However, these methods inadvertently introduce a distribution shift during reward learning due to ignoring the dependence of agent-generated trajectories on the reward learning objective, ultimately resulting in sub-optimal alignment. Hence, in this work, we address this challenge by advocating for the adoption of regularized reward functions that more accurately mirror the intended behaviors of the agent. We propose a novel conc
Authors
(none)
Tags
Stats
Related papers
- Regularization Matters In Policy Optimization (2019)2.68
- Average Reward Reinforcement Learning For Omega-regular And Mean-payoff Objectives (2025)0.00
- Provably Feedback-efficient Reinforcement Learning Via Active Reward Learning (2023)0.00
- Aligning Humans And Robots Via Reinforcement Learning From Implicit Human Feedback (2025)2.26
- Reinforcement Learning From Diverse Human Preferences (2023)0.00
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Disturbing Reinforcement Learning Agents With Corrupted Rewards (2021)0.00
- The Effects Of Reward Misspecification: Mapping And Mitigating Misaligned Models (2022)0.00