Ο^2-bench
Emerging3papers using it
2026first seen
The 'Ο^2-Bench' is a benchmark dataset used to evaluate the performance of online warning monitors for large language model agents by assessing their ability to predict risks based on traces of agent actions.