Robust On-policy Sampling For Data-efficient Policy Evaluation In Reinforcement Learning
2021 · Rujie Zhong, Duohan Zhang, Lukas Schäfer, et al.
Abstract
Reinforcement learning (RL) algorithms are often categorized as either on-policy or off-policy depending on whether they use data from a target policy of interest or from a different behavior policy. In this paper, we study a subtle distinction between on-policy data and on-policy sampling in the context of the RL sub-problem of policy evaluation. We observe that on-policy sampling may fail to match the expected distribution of on-policy data after observing only a finite number of trajectories and this failure hinders data-efficient policy evaluation. Towards improved data-efficiency, we show how non-i.i.d., off-policy sampling can produce data that more closely matches the expected on-policy data distribution and consequently increases the accuracy of the Monte Carlo estimator for policy evaluation. We introduce a method called Robust On-Policy Sampling and demonstrate theoretically and empirically that it produces data that converges faster to the expected on-policy distribution com
Authors
(none)
Tags
Stats
Related papers
- On-policy Policy Gradient Reinforcement Learning Without On-policy Sampling (2023)0.00
- Online Estimation And Inference For Robust Policy Evaluation In Reinforcement Learning (2023)2.26
- Behaviour Policy Optimization: Provably Lower Variance Return Estimates For Off-policy Reinforcement Learning (2025)0.00
- Towards Optimal Off-policy Evaluation For Reinforcement Learning With Marginalized Importance Sampling (2019)0.00
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Off-policy RL Algorithms Can Be Sample-efficient For Continuous Control Via Sample Multiple Reuse (2023)0.00
- Distributionally Robust Model-based Offline Reinforcement Learning With Near-optimal Sample Complexity (2022)0.00
- Low Variance Off-policy Evaluation With State-based Importance Sampling (2022)0.00