Maximum-likelihood Inverse Reinforcement Learning With Finite-time Guarantees
2022 Β· Siliang Zeng, Chenliang Li, Alfredo Garcia, et al.
Abstract
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy that best fits observed sequences of states and actions implemented by an expert. Many algorithms for IRL have an inherently nested structure: the inner loop finds the optimal policy given parametrized rewards while the outer loop updates the estimates towards optimizing a measure of fit. For high dimensional environments such nested-loop structure entails a significant computational burden. To reduce the computational burden of a nested loop, novel methods such as SQIL [1] and IQ-Learn [2] emphasize policy estimation at the expense of reward estimation accuracy. However, without accurate estimated rewards, it is not possible to do counterfactual analysis such as predicting the optimal policy under different environment dynamics and/or learning new tasks. In this paper we develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy. In the prop
Authors
(none)
Tags
Stats
Related papers
- Towards Theoretical Understanding Of Inverse Reinforcement Learning (2023)0.00
- Inverse Reinforcement Learning With Explicit Policy Estimates (2021)2.26
- In-trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates (2024)5.24
- Inverse Reinforcement Learning Without Reinforcement Learning (2023)0.00
- On The Effective Horizon Of Inverse Reinforcement Learning (2023)0.00
- Inverse Reinforcement Learning With Simultaneous Estimation Of Rewards And Dynamics (2016)0.00
- Inverse Reinforcement Learning In A Continuous State Space With Formal Guarantees (2021)0.00
- Offline Inverse RL: New Solution Concepts And Provably Efficient Algorithms (2024)0.00