On The Effective Horizon Of Inverse Reinforcement Learning
2023 Β· Yiqing Xu, Finale Doshi-Velez, David Hsu
Abstract
Inverse reinforcement learning (IRL) algorithms often rely on (forward) reinforcement learning or planning, over a given time horizon, to compute an approximately optimal policy for a hypothesized reward function; they then match this policy with expert demonstrations. The time horizon plays a critical role in determining both the accuracy of reward estimates and the computational efficiency of IRL algorithms. Interestingly, an *effective time horizon* shorter than the ground-truth value often produces better results faster. This work formally analyzes this phenomenon and provides an explanation: the time horizon controls the complexity of an induced policy class and mitigates overfitting with limited data. This analysis provides a guide for the principled choice of the effective horizon for IRL. It also prompts us to re-examine the classic IRL formulation: it is more natural to learn jointly the reward and the effective horizon rather than the reward alone with a given horizon. To val
Authors
(none)
Tags
Stats
Related papers
- Towards Theoretical Understanding Of Inverse Reinforcement Learning (2023)0.00
- Maximum-likelihood Inverse Reinforcement Learning With Finite-time Guarantees (2022)0.00
- Is Inverse Reinforcement Learning Harder Than Standard Reinforcement Learning? A Theoretical Perspective (2023)0.00
- Inverse Reinforcement Learning Without Reinforcement Learning (2023)0.00
- In-trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates (2024)5.24
- A Survey Of Inverse Reinforcement Learning: Challenges, Methods And Progress (2018)0.00
- Offline Inverse RL: New Solution Concepts And Provably Efficient Algorithms (2024)0.00
- Active Exploration For Inverse Reinforcement Learning (2022)0.00