Persim: Data-efficient Offline Reinforcement Learning With Heterogeneous Agents Via Personalized Simulators
2021 Β· Anish Agarwal, Abdullah Alomar, Varkey Alumootil, et al.
Abstract
We consider offline reinforcement learning (RL) with heterogeneous agents under severe data scarcity, i.e., we only observe a single historical trajectory for every agent under an unknown, potentially sub-optimal policy. We find that the performance of state-of-the-art offline and model-based RL methods degrade significantly given such limited data availability, even for commonly perceived "solved" benchmark settings such as "MountainCar" and "CartPole". To address this challenge, we propose PerSim, a model-based offline RL approach which first learns a personalized simulator for each agent by collectively using the historical trajectories across all agents, prior to learning a policy. We do so by positing that the transition dynamics across agents can be represented as a latent function of latent factors associated with agents, states, and actions; subsequently, we theoretically establish that this function is well-approximated by a "low-rank" decomposition of separable agent, state,
Authors
(none)
Tags
Stats
Related papers
- Towards Data-driven Offline Simulations For Online Reinforcement Learning (2022)0.00
- Reinforcement Learning For Individual Optimal Policy From Heterogeneous Data (2025)0.00
- When To Trust Your Simulator: Dynamics-aware Hybrid Offline-and-online Reinforcement Learning (2022)2.26
- Offsim: Offline Simulator For Model-based Offline Inverse Reinforcement Learning (2025)0.00
- Offline Fictitious Self-play For Competitive Games (2024)0.00
- Using Offline Data To Speed Up Reinforcement Learning In Procedurally Generated Environments (2023)6.77
- Behavior Estimation From Multi-source Data For Offline Reinforcement Learning (2022)2.26
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00