CTIBench (Average) ctibench Leaderboard
CTIBench cyber-threat-intelligence benchmark β average score across its sub-tasks (CyNER, APTNER, CyNews, SecMMLU, CyQuiz, MITRE, CVE, and protocol F1). Evaluates LLM knowledge of the cyber-threat landscape. Β· Metric: Average Score (higher is better)
| # | Model | Average Score | Paper |
|---|---|---|---|
| 1 | GPT-4 | 69.60 | β |
| 2 | GPT-3.5-Turbo | 62.60 | β |
| 3 | Mistral-7B-v0.1 | 58.10 | β |
| 4 | Zephyr-7B-beta | 57.70 | β |
| 5 | Vicuna-13B-v1.5 | 57.30 | β |
| 6 | Mistral-7B-Instruct-v0.1 | 55.00 | β |
| 7 | Llama-2-13B | 54.10 | β |
| 8 | Vicuna-7B-v1.5 | 53.00 | β |
| 9 | Llama-2-7B | 50.60 | β |
| 10 | Llama-2-13B-Chat | 45.00 | β |
| 11 | Llama-2-7B-Chat | 44.60 | β |
| 12 | Falcon-7B | 39.40 | β |
| 13 | Falcon-7B-Instruct | 37.50 | β |