Psiphi-learning: Reinforcement Learning With Demonstrations Using Successor Features And Inverse Temporal Difference Learning
2021 Β· Angelos Filos, Clare Lyle, Yarin Gal, et al.
Abstract
We study reinforcement learning (RL) with no-reward demonstrations, a setting in which an RL agent has access to additional data from the interaction of other agents with the same environment. However, it has no access to the rewards or goals of these agents, and their objectives and levels of expertise may vary widely. These assumptions are common in multi-agent settings, such as autonomous driving. To effectively use this data, we turn to the framework of successor features. This allows us to disentangle shared features and dynamics of the environment from agent-specific rewards and policies. We propose a multi-task inverse reinforcement learning (IRL) algorithm, called *inverse temporal difference learning* (ITD), that learns shared state features, alongside per-agent successor features and preference vectors, purely from demonstrations without reward labels. We further show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algor
Authors
(none)
Tags
Stats
Related papers
- Non-adversarial Inverse Reinforcement Learning Via Successor Feature Matching (2024)0.00
- Distance-rank Aware Sequential Reward Learning For Inverse Reinforcement Learning With Sub-optimal Demonstrations (2023)0.00
- Task-guided Inverse Reinforcement Learning Under Partial Information (2021)0.00
- Basis For Intentions: Efficient Inverse Reinforcement Learning Using Past Experience (2022)0.00
- A Dual Approach To Imitation Learning From Observations With Offline Datasets (2024)0.00
- Inverse Reinforcement Learning Without Reinforcement Learning (2023)0.00
- Interactive Reinforcement Learning With Dynamic Reuse Of Prior Knowledge From Human/agent's Demonstration (2018)8.60
- Inverse Reinforcement Learning With Simultaneous Estimation Of Rewards And Dynamics (2016)0.00