Conditional Importance Sampling For Off-policy Learning
2019 Β· Mark Rowland, Anna Harutyunyan, Hado van Hasselt, et al.
Abstract
The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.
Authors
(none)
Tags
Stats
Related papers
- Relative Importance Sampling For Off-policy Actor-critic In Deep Reinforcement Learning (2018)2.26
- Low Variance Off-policy Evaluation With State-based Importance Sampling (2022)0.00
- Importance Sampling Policy Evaluation With An Estimated Behavior Policy (2018)0.00
- Counterfactual-augmented Importance Sampling For Semi-offline Policy Evaluation (2023)0.00
- Sample Dropout: A Simple Yet Effective Variance Reduction Technique In Deep Policy Optimization (2023)0.00
- Robust On-policy Sampling For Data-efficient Policy Evaluation In Reinforcement Learning (2021)0.00
- Handling Cost And Constraints With Off-policy Deep Reinforcement Learning (2023)0.00
- Behaviour Policy Optimization: Provably Lower Variance Return Estimates For Off-policy Reinforcement Learning (2025)0.00