Bayesian Off-policy Evaluation And Learning For Large Action Spaces
2024 Β· Imad Aouali, Victor-Emmanuel Brunel, David Rohde, et al.
Abstract
In interactive systems, actions are often correlated, presenting an opportunity for more sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We introduce a unified Bayesian framework to capture these correlations through structured and informative priors. In this framework, we propose sDM, a generic Bayesian approach for OPE and OPL, grounded in both algorithmic and theoretical foundations. Notably, sDM leverages action correlations without compromising computational efficiency. Moreover, inspired by online Bayesian bandits, we introduce Bayesian metrics that assess the average performance of algorithms across multiple problem instances, deviating from the conventional worst-case assessments. We analyze sDM in OPE and OPL, highlighting the benefits of leveraging action correlations. Empirical evidence showcases the strong performance of sDM.
Authors
(none)
Tags
Stats
Related papers
- Doubly Robust Estimator For Off-policy Evaluation With Large Action Spaces (2023)0.00
- POTEC: Off-policy Learning For Large Action Spaces Via Two-stage Policy Decomposition (2024)0.00
- Local Metric Learning For Off-policy Evaluation In Contextual Bandits With Continuous Actions (2022)0.00
- Bayesian Action Decoder For Deep Multi-agent Reinforcement Learning (2018)0.00
- On Many-actions Policy Gradient (2022)0.00
- Kernel Metric Learning For In-sample Off-policy Evaluation Of Deterministic RL Policies (2024)0.00
- Double Reinforcement Learning For Efficient Off-policy Evaluation In Markov Decision Processes (2019)0.00
- Intrinsically Efficient, Stable, And Bounded Off-policy Evaluation For Reinforcement Learning (2019)0.00