GAIA
Canonical44papers using it
2024first seen
Papers using GAIA (44)
- EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic EnvironmentsAWorld: Orchestrating the Training Recipe for Agentic AIHarnessX: A Composable, Adaptive, and Evolvable Agent Harness FoundryRethinking Memory as Continuously Evolving ConnectivitySlimSearcher: Training Efficiency-Aware Web Agents via Adaptive Reward GatingBeyond Turn Limits: Training Deep Search Agents with Dynamic Context WindowCodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMsFrom Failed Trajectories to Reliable LLM Agents: Diagnosing and Repairing Harness FlawsEarly Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware ObservabilityCharacterization of Multi-Model Agentic AI Systems on General Tasks via Trace-Driven SimulationActiveMem: Distributed Active Memory for Long-Horizon LLM ReasoningTowards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent WorkflowsDecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic WorkflowsWhere LLM Agents Fail And How They Can Learn From FailuresOWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task AutomationInfantagent-next: A Multimodal Generalist Agent For Automated Computer InteractionMaximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement LearningOpenJarvis: Personal AI, On Personal DevicesGaia-v2-lilt: Multilingual Adaptation Of Agent Benchmark Beyond TranslationEvoroute: Experience-driven Self-routing LLM Agent SystemsAOrchestra: Automating Sub-Agent Creation for Agentic OrchestrationReThinker: Scientific Reasoning by Rethinking with Guided Reflection and Confidence ControlLearning to Share: Selective Memory for Efficient Parallel Agentic SystemsSearch More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and GeneralizationMiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research TasksWebAnchor: Anchoring Agent Planning to Stabilize Long-Horizon Web ReasoningBeyond Rule-Based Workflows: An Information-Flow-Orchestrated Multi-Agents Paradigm via Agent-to-Agent Communication from CORALYunque DeepResearch Technical ReportMonoScale: Scaling Multi-Agent System with Monotonic ImprovementUnifying Dynamic Tool Creation and Cross-Task Experience Sharing through Cognitive Memory ArchitectureFlowSearch: Advancing deep research with dynamic structured knowledge flowMATRIX: Multimodal Agent Tuning for Robust Tool-Use ReasoningProtocolBench: Which LLM MultiAgent Protocol to Choose?Stop Wasting Your Tokens: Towards Efficient Runtime Multi-Agent SystemsTool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool UseGradientsys: A Multi-Agent LLM Scheduler with ReAct OrchestrationAuto-eval Judge: Towards A General Agentic Framework For Task Completion EvaluationMirothinker: Pushing The Performance Boundaries Of Open-source Research Agents Via Model, Context, And Interactive ScalingCOMPASS: Enhancing Agent Long-horizon Reasoning With Evolving ContextEfficient Agents: Building Effective Agents While Reducing CostResearstudio: A Human-intervenable Framework For Building Controllable Deep-research AgentsDivide, Optimize, Merge: Fine-Grained LLM Agent Optimization at ScaleAffordable AI Assistants with Knowledge Graph of ThoughtsMulti-modal Agent Tuning: Building A Vlm-driven Agent For Efficient Tool Usage