Rigorous Agent Evaluation: An Adversarial Approach To Uncover Catastrophic Failures
2018 Β· Jonathan Uesato, Ananya Kumar, Csaba Szepesvari, et al.
Abstract
This paper addresses the problem of evaluating learning systems in safety critical domains such as autonomous driving, where failures can have catastrophic consequences. We focus on two problems: searching for scenarios when learned agents fail and assessing their probability of failure. The standard method for agent evaluation in reinforcement learning, Vanilla Monte Carlo, can miss failures entirely, leading to the deployment of unsafe agents. We demonstrate this is an issue for current agents, where even matching the compute used for training is sometimes insufficient for evaluation. To address this shortcoming, we draw upon the rare event probability estimation literature and propose an adversarial evaluation approach. Our approach focuses evaluation on adversarially chosen situations, while still providing unbiased estimates of failure probabilities. The key difficulty is in identifying these adversarial situations -- since failures are rare there is little signal to drive optimiz
Authors
(none)
Tags
Stats
Related papers
- On Assessing The Safety Of Reinforcement Learning Algorithms Using Formal Methods (2021)0.00
- Constrained Black-box Attacks Against Cooperative Multi-agent Reinforcement Learning (2025)0.00
- Certifying Safety In Reinforcement Learning Under Adversarial Perturbation Attacks (2022)0.00
- Scalable Safety-critical Policy Evaluation With Accelerated Rare Event Sampling (2021)0.00
- Interpretable Failure Analysis In Multi-agent Reinforcement Learning Systems (2026)0.00
- Toward Evaluating Robustness Of Reinforcement Learning With Adversarial Policy (2023)4.52
- Adversarial Reinforcement Learning For Observer Design In Autonomous Systems Under Cyber Attacks (2018)0.00
- Vulnerable Agent Identification In Large-scale Multi-agent Reinforcement Learning (2025)0.00