HLE
Emerging6papers using it
34,984HF downloads
836HF likes
2025first seen
[!NOTE] IMPORTANT: Please help us protect the integrity of this benchmark by not publicly sharing, re-uploading, or distributing the dataset. Humanity's Last Exam π Website | π Paper | GitHub Center for AI Safety & Scale AI Humanity's Last Exam (HLE) is a multi-modal benchmark at the frontier of human knowledge, desi
π€ Hugging Faceβ mit
Papers using HLE (6)
- Self-Verified Distillation: Your Language Model Is Secretly Its Own Synthetic Data PipelineMiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research TasksSCOPE: Prompt Evolution for Enhancing Agent EffectivenessEAPO: Enhancing Policy Optimization with On-Demand Expert AssistanceB-score: Detecting biases in large language models using response
historyA^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid
Reasoning