GAIA

Emerging

10papers using it

42,218HF downloads

694HF likes

2025first seen

GAIA dataset GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format. Data and

🤗 Hugging Face

Papers using GAIA (9)

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks2026

MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents2025

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution2025

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs2025

Where LLM Agents Fail and How They can Learn From Failures2025

WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking2025

Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO2025

DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems2025

Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization2025