Partial Identifiability And Misspecification In Inverse Reinforcement Learning

Abstract

The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function \(R\) from a policy \(\pi\). This problem is difficult, for several reasons. First of all, there are typically multiple reward functions which are compatible with a given policy; this means that the reward function is only *partially identifiable*, and that IRL contains a certain fundamental degree of ambiguity. Secondly, in order to infer \(R\) from \(\pi\), an IRL algorithm must have a *behavioural model* of how \(\pi\) relates to \(R\). However, the true relationship between human preferences and human behaviour is very complex, and practically impossible to fully capture with a simple model. This means that the behavioural model in practice will be *misspecified*, which raises the worry that it might lead to unsound inferences if applied to real-world data. In this paper, we provide a comprehensive mathematical analysis of partial identifiability and misspecification in IRL. Specifically, we fully charact

Partial Identifiability And Misspecification In Inverse Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers