Scalable Safety-critical Policy Evaluation With Accelerated Rare Event Sampling
2021 Β· Mengdi Xu, Peide Huang, Fengpei Li, et al.
Abstract
Evaluating rare but high-stakes events is one of the main challenges in obtaining reliable reinforcement learning policies, especially in large or infinite state/action spaces where limited scalability dictates a prohibitively large number of testing iterations. On the other hand, a biased or inaccurate policy evaluation in a safety-critical system could potentially cause unexpected catastrophic failures during deployment. This paper proposes the Accelerated Policy Evaluation (APE) method, which simultaneously uncovers rare events and estimates the rare event probability in Markov decision processes. The APE method treats the environment nature as an adversarial agent and learns towards, through adaptive importance sampling, the zero-variance sampling distribution for the policy evaluation. Moreover, APE is scalable to large discrete or continuous spaces by incorporating function approximators. We investigate the convergence property of APE in the tabular setting. Our empirical studies
Authors
(none)
Tags
Stats
Related papers
- Efficient Policy Evaluation With Safety Constraint For Reinforcement Learning (2024)0.00
- Towards Optimal Off-policy Evaluation For Reinforcement Learning With Marginalized Importance Sampling (2019)0.00
- Robust On-policy Sampling For Data-efficient Policy Evaluation In Reinforcement Learning (2021)0.00
- Empirical Study Of Off-policy Policy Evaluation For Reinforcement Learning (2019)0.00
- Counterfactual-augmented Importance Sampling For Semi-offline Policy Evaluation (2023)0.00
- Policy Search With Rare Significant Events: Choosing The Right Partner To Cooperate With (2021)3.58
- Rigorous Agent Evaluation: An Adversarial Approach To Uncover Catastrophic Failures (2018)0.00
- Statistically Efficient Variance Reduction With Double Policy Estimation For Off-policy Evaluation In Sequence-modeled Reinforcement Learning (2023)0.00