Reinforcement Learning For Individual Optimal Policy From Heterogeneous Data
2025 Β· Rui Miao, Babak Shahbaba, Annie Qu
Abstract
Offline reinforcement learning (RL) aims to find optimal policies in dynamic environments in order to maximize the expected total rewards by leveraging pre-collected data. Learning from heterogeneous data is one of the fundamental challenges in offline RL. Traditional methods focus on learning an optimal policy for all individuals with pre-collected data from a single episode or homogeneous batch episodes, and thus, may result in a suboptimal policy for a heterogeneous population. In this paper, we propose an individualized offline policy optimization framework for heterogeneous time-stationary Markov decision processes (MDPs). The proposed heterogeneous model with individual latent variables enables us to efficiently estimate the individual Q-functions, and our Penalized Pessimistic Personalized Policy Learning (P4L) algorithm guarantees a fast rate on the average regret under a weak partial coverage assumption on behavior policies. In addition, our simulation studies and a real data
Authors
(none)
Tags
Stats
Related papers
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- Behavior Estimation From Multi-source Data For Offline Reinforcement Learning (2022)2.26
- Persim: Data-efficient Offline Reinforcement Learning With Heterogeneous Agents Via Personalized Simulators (2021)0.00
- Federated Offline Reinforcement Learning (2022)0.00
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- Beyond Uniform Sampling: Offline Reinforcement Learning With Imbalanced Datasets (2023)2.83
- Near-optimal Offline Reinforcement Learning Via Double Variance Reduction (2021)0.00
- Sample Efficient Active Algorithms For Offline Reinforcement Learning (2026)0.00