Distance-rank Aware Sequential Reward Learning For Inverse Reinforcement Learning With Sub-optimal Demonstrations
2023 Β· Lu Li, Yuxin Pan, Ruobing Chen, et al.
Abstract
Inverse reinforcement learning (IRL) aims to explicitly infer an underlying reward function based on collected expert demonstrations. Considering that obtaining expert demonstrations can be costly, the focus of current IRL techniques is on learning a better-than-demonstrator policy using a reward function derived from sub-optimal demonstrations. However, existing IRL algorithms primarily tackle the challenge of trajectory ranking ambiguity when learning the reward function. They overlook the crucial role of considering the degree of difference between trajectories in terms of their returns, which is essential for further removing reward ambiguity. Additionally, it is important to note that the reward of a single transition is heavily influenced by the context information within the trajectory. To address these issues, we introduce the Distance-rank Aware Sequential Reward Learning (DRASRL) framework. Unlike existing approaches, DRASRL takes into account both the ranking of trajectories
Authors
(none)
Tags
Stats
Related papers
- Inverse Reinforcement Learning With Missing Data (2019)0.00
- Inverse Reinforcement Learning Without Reinforcement Learning (2023)0.00
- Offline Inverse RL: New Solution Concepts And Provably Efficient Algorithms (2024)0.00
- Is Inverse Reinforcement Learning Harder Than Standard Reinforcement Learning? A Theoretical Perspective (2023)0.00
- Kernel Density Bayesian Inverse Reinforcement Learning (2023)0.00
- Inverse Reinforcement Learning With Simultaneous Estimation Of Rewards And Dynamics (2016)0.00
- Psiphi-learning: Reinforcement Learning With Demonstrations Using Successor Features And Inverse Temporal Difference Learning (2021)0.00
- Task-guided Inverse Reinforcement Learning Under Partial Information (2021)0.00