An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning
2018 Β· Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, et al.
Abstract
Our goal is for AI systems to correctly identify and act according to their human user's objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. In this work, we exploit a specific property of CIRL---the human is a full information agent---to derive an optimality-preserving modification to the standard Bellman update; this reduces the complexity of the problem by an exponential factor and allows us to relax CIRL's assumption of human rationality. We apply this update to a variety of POMDP solvers and find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for bo
Authors
(none)
Tags
Stats
Related papers
- Cooperative Inverse Reinforcement Learning (2016)0.00
- Non-cooperative Inverse Reinforcement Learning (2019)0.00
- Interactive Inverse Reinforcement Learning For Cooperative Games (2021)0.00
- Multi-agent Inverse Reinforcement Learning For Certain General-sum Stochastic Games (2018)10.97
- Multi-agent Inverse Reinforcement Learning: Suboptimal Demonstrations And Alternative Solution Concepts (2021)0.00
- Solving Common-payoff Games With Approximate Policy Iteration (2021)3.58
- Real-world Human-robot Collaborative Reinforcement Learning (2020)9.41
- Bayesian Robust Optimization For Imitation Learning (2020)0.00