Inverse Reinforcement Learning From Non-stationary Learning Agents
2024 Β· Kavinayan P. Sivakumar, Yi Shen, Zachary Bell, et al.
Abstract
In this paper, we study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy. To address this problem, we propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function. Our method relies on a new variant of the behavior cloning algorithm, which we call bundle behavior cloning, and uses a small number of trajectories generated by the learning agent's policy at different points in time to learn a set of policies that match the distribution of actions observed in the sampled trajectories. We then use the cloned policies to train a neural network model that estimates the reward function of the learning agent. We provide a theoretical analysis to show a complexity result on bound guarantees for our method that beats standard behavior cloning as well
Authors
(none)
Tags
Stats
Related papers
- In-trajectory Inverse Reinforcement Learning: Learn Incrementally Before An Ongoing Trajectory Terminates (2024)5.24
- Reward-conditioned Policies (2019)0.00
- Inverse Reinforcement Learning With Missing Data (2019)0.00
- Learning Long-term Reward Redistribution Via Randomized Return Decomposition (2021)0.00
- Inverse Reinforcement Learning With Simultaneous Estimation Of Rewards And Dynamics (2016)0.00
- Adversarial Recovery Of Agent Rewards From Latent Spaces Of The Limit Order Book (2019)0.00
- Discovering Individual Rewards In Collective Behavior Through Inverse Multi-agent Reinforcement Learning (2023)0.00
- Towards Theoretical Understanding Of Inverse Reinforcement Learning (2023)0.00