GAIA
Emerging10papers using it
42,218HF downloads
694HF likes
2025first seen
GAIA dataset GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format. Data and
Papers using GAIA (9)
- MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research TasksMetaChain: A Fully-Automated and Zero-Code Framework for LLM AgentsAlita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal
Predefinition and Maximal Self-EvolutionAgentFly: Fine-tuning LLM Agents without Fine-tuning LLMsWhere LLM Agents Fail and How They can Learn From FailuresWebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling
Info-Rich SeekingMulti-Agent Deep Research: Training Multi-Agent Systems with M-GRPODoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent SystemsYoutu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization