Policy Certificates: Towards Accountable Reinforcement Learning
2018 Β· Christoph Dann, Lihong Li, Wei Wei, et al.
Abstract
The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration. Existing algorithms provide little information about the quality of their current policy before executing it, and thus have limited use in high-stakes applications like healthcare. We address this lack of accountability by proposing that algorithms output policy certificates. These certificates bound the sub-optimality and return of the policy in the next episode, allowing humans to intervene when the certified quality is not satisfactory. We further introduce two new algorithms with certificates and present a new framework for theoretical analysis that guarantees the quality of their policies and certificates. For tabular MDPs, we show that computing certificates can even improve the sample-efficiency of optimism-based exploration. As a result, one of our algorithms is the first to achieve minimax-optimal PAC bounds up to lower-order terms, and this algorithm also matches
Authors
(none)
Tags
Stats
Related papers
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Certifiably Robust Reinforcement Learning Through Model-based Abstract Interpretation (2023)2.26
- Beyond Expected Return: Accounting For Policy Reproducibility When Evaluating Reinforcement Learning Algorithms (2023)3.58
- Certifying Safety In Reinforcement Learning Under Adversarial Perturbation Attacks (2022)0.00
- Policy Improvement Reinforcement Learning (2026)0.00
- Some Supervision Required: Incorporating Oracle Policies In Reinforcement Learning Via Epistemic Uncertainty Metrics (2022)0.00
- Efficient Algorithms For Mitigating Uncertainty And Risk In Reinforcement Learning (2025)0.00
- Learning Safe Policies With Expert Guidance (2018)0.00