When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework For Offline Inverse Reinforcement Learning
2023 Β· Siliang Zeng, Chenliang Li, Alfredo Garcia, et al.
Abstract
Offline inverse reinforcement learning (Offline IRL) aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving. However, the structure of an expert's preferences implicit in observed actions is closely linked to the expert's model of the environment dynamics (i.e. the ``world'' model). Thus, inaccurate models of the world obtained from finite data with limited coverage could compound inaccuracy in estimated rewards. To address this issue, we propose a bi-level optimization formulation of the estimation task wherein the upper level is likelihood maximization based upon a conservative model of the expert's policy (lower level). The policy model is conservative in that it maximizes reward subject to a penalty that is increasing in the u
Authors
(none)
Tags
Stats
Related papers
- Offline Inverse RL: New Solution Concepts And Provably Efficient Algorithms (2024)0.00
- A Bayesian Approach To Robust Inverse Reinforcement Learning (2023)0.00
- CLARE: Conservative Model-based Reward Learning For Offline Inverse Reinforcement Learning (2023)0.00
- Is Inverse Reinforcement Learning Harder Than Standard Reinforcement Learning? A Theoretical Perspective (2023)0.00
- Active Exploration For Inverse Reinforcement Learning (2022)0.00
- Offsim: Offline Simulator For Model-based Offline Inverse Reinforcement Learning (2025)0.00
- Distributional Inverse Reinforcement Learning (2025)0.00
- Maximum-likelihood Inverse Reinforcement Learning With Finite-time Guarantees (2022)0.00