Rewriting History With Inverse RL: Hindsight Inference For Policy Improvement
2020 Β· Benjamin Eysenbach, Xinyang Geng, Sergey Levine, et al.
Abstract
Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically ask: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? In this paper, we show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary classes of tasks. Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings, including goal-reaching, domains with discrete sets of rewards, and those with linear reward functions.
Authors
(none)
Tags
Stats
Related papers
- Hindsight Foresight Relabeling For Meta-reinforcement Learning (2021)0.00
- Hindsight Policy Gradients (2017)0.00
- Hindsight Trust Region Policy Optimization (2019)0.00
- Hindsight Priors For Reward Learning From Human Preferences (2024)0.00
- Basis For Intentions: Efficient Inverse Reinforcement Learning Using Past Experience (2022)0.00
- Replacing Rewards With Examples: Example-based Policy Search Via Recursive Classification (2021)0.00
- Imitating Past Successes Can Be Very Suboptimal (2022)0.00
- Reward Shaping For Human Learning Via Inverse Reinforcement Learning (2020)0.00