← all datasets

GAIA

Canonical

44papers using it

2024first seen

🔎 Find this dataset

Papers using GAIA (44)

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments2026

AWorld: Orchestrating the Training Recipe for Agentic AI2025 · 11 cites

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry2026

Rethinking Memory as Continuously Evolving Connectivity2026

SlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward Gating2026

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window2025

CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs2025 · 4 cites

From Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness Flaws2026

Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability2026

Characterization of Multi-Model Agentic AI Systems on General Tasks via Trace-Driven Simulation2026

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning2026

Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows2026

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows2026

Where LLM Agents Fail And How They Can Learn From Failures2025

OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation2025 · 1 cites

Infantagent-next: A Multimodal Generalist Agent For Automated Computer Interaction2025

Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning2026

OpenJarvis: Personal AI, On Personal Devices2026

Gaia-v2-lilt: Multilingual Adaptation Of Agent Benchmark Beyond Translation2026

Evoroute: Experience-driven Self-routing LLM Agent Systems2026

AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration2026

ReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence Control2026

Learning to Share: Selective Memory for Efficient Parallel Agentic Systems2026

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization2026

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks2026

WebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web Reasoning2026

Beyond Rule-Based Workflows: An Information-Flow-Orchestrated Multi-Agents Paradigm via Agent-to-Agent Communication from CORAL2026

Yunque DeepResearch Technical Report2026

MonoScale: Scaling Multi-Agent System with Monotonic Improvement2026

Unifying Dynamic Tool Creation and Cross-Task Experience Sharing through Cognitive Memory Architecture2025

FlowSearch: Advancing deep research with dynamic structured knowledge flow2025

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning2025

ProtocolBench: Which LLM MultiAgent Protocol to Choose?2025

Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent Systems2025

Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use2025

Gradientsys: A Multi-Agent LLM Scheduler with ReAct Orchestration2025

Auto-eval Judge: Towards A General Agentic Framework For Task Completion Evaluation2025

Mirothinker: Pushing The Performance Boundaries Of Open-source Research Agents Via Model, Context, And Interactive Scaling2025

COMPASS: Enhancing Agent Long-horizon Reasoning With Evolving Context2025

Efficient Agents: Building Effective Agents While Reducing Cost2025

Researstudio: A Human-intervenable Framework For Building Controllable Deep-research Agents2025

Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale2025

Affordable AI Assistants with Knowledge Graph of Thoughts2025

Multi-modal Agent Tuning: Building A Vlm-driven Agent For Efficient Tool Usage2024

GAIA — datasets — ai-agents