ORSO: Accelerating Reward Design Via Online Reward Selection And Policy Optimization
2024 Β· Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano, et al.
Abstract
Reward shaping is critical in reinforcement learning (RL), particularly for complex tasks where sparse rewards can hinder learning. However, choosing effective shaping rewards from a set of reward functions in a computationally efficient manner remains an open challenge. We propose Online Reward Selection and Policy Optimization (ORSO), a novel approach that frames the selection of shaping reward function as an online model selection problem. ORSO automatically identifies performant shaping reward functions without human intervention with provable regret guarantees. We demonstrate ORSO's effectiveness across various continuous control tasks. Compared to prior approaches, ORSO significantly reduces the amount of data required to evaluate a shaping reward function, resulting in superior data efficiency and a significant reduction in computational time (up to 8 times). ORSO consistently identifies high-quality reward functions outperforming prior methods by more than 50% and on average id
Authors
(none)
Tags
Stats
Related papers
- Learning To Shape Rewards Using A Game Of Two Partners (2021)0.00
- Reward Design For Reinforcement Learning Agents (2025)0.00
- Highly Efficient Self-adaptive Reward Shaping For Reinforcement Learning (2024)0.00
- Unpacking Reward Shaping: Understanding The Benefits Of Reward Engineering On Sample Complexity (2022)4.52
- Optimistic Curiosity Exploration And Conservative Exploitation With Linear Reward Shaping (2022)0.00
- Reward Shaping For Human Learning Via Inverse Reinforcement Learning (2020)0.00
- Designing Rewards For Fast Learning (2022)0.00
- Continuously Discovering Novel Strategies Via Reward-switching Policy Optimization (2022)0.00