Federated Offline Policy Learning
2023 Β· Aldo Gael Carranza, Susan Athey
Abstract
We consider the problem of learning personalized decision policies from observational bandit feedback data across multiple heterogeneous data sources. In our approach, we introduce a novel regret analysis that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source. We characterize these regret bounds by expressions of source heterogeneity and distribution shift. Moreover, we examine the practical considerations of this problem in the federated setting where a central server aims to train a policy on data distributed across the heterogeneous sources without collecting any of their raw data. We present a policy learning algorithm amenable to federation based on the aggregation of local policies trained with doubly robust offline policy evaluation strategies. Our analysis and supporting experimental results provide insights into tradeoffs in the participation of heterogeneous data so
Authors
(none)
Tags
Stats
Related papers
- Federated Offline Policy Optimization With Dual Regularization (2024)3.58
- Federated Offline Reinforcement Learning: Collaborative Single-policy Coverage Suffices (2024)0.00
- Federated Offline Reinforcement Learning (2022)0.00
- Federated Ensemble-directed Offline Reinforcement Learning (2023)0.00
- Reinforcement Learning For Individual Optimal Policy From Heterogeneous Data (2025)0.00
- Policy Learning "without" Overlap: Pessimism And Generalized Empirical Bernstein's Inequality (2022)0.00
- Fedhpd: Heterogeneous Federated Reinforcement Learning Via Policy Distillation (2025)2.26
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00