Inverse Reinforcement Learning With The Average Reward Criterion
2023 Β· Feiyang Wu, Jingyang Ke, Anqi Wu
Abstract
We study the problem of Inverse Reinforcement Learning (IRL) with an average-reward criterion. The goal is to recover an unknown policy and a reward function when the agent only has samples of states and actions from an experienced agent. Previous IRL methods assume that the expert is trained in a discounted environment, and the discount factor is known. This work alleviates this assumption by proposing an average-reward framework with efficient learning algorithms. We develop novel stochastic first-order methods to solve the IRL problem under the average-reward setting, which requires solving an Average-reward Markov Decision Process (AMDP) as a subproblem. To solve the subproblem, we develop a Stochastic Policy Mirror Descent (SPMD) method under general state and action spaces that needs \(\mathcal\{\{O\}\}(1/\epsilon)\) steps of gradient computation. Equipped with SPMD, we propose the Inverse Policy Mirror Descent (IPMD) method for solving the IRL problem with a \(\mathcal\{O\}(1/\e
Authors
(none)
Tags
Stats
Related papers
- Towards Theoretical Understanding Of Inverse Reinforcement Learning (2023)0.00
- Inverse Reinforcement Learning With Simultaneous Estimation Of Rewards And Dynamics (2016)0.00
- Inverse Reinforcement Learning Without Reinforcement Learning (2023)0.00
- Performance Bounds For Policy-based Average Reward Reinforcement Learning Algorithms (2023)2.26
- Inverse Reinforcement Learning With Explicit Policy Estimates (2021)2.26
- Is Inverse Reinforcement Learning Harder Than Standard Reinforcement Learning? A Theoretical Perspective (2023)0.00
- Maximum-likelihood Inverse Reinforcement Learning With Finite-time Guarantees (2022)0.00
- A Survey Of Inverse Reinforcement Learning: Challenges, Methods And Progress (2018)0.00