AppWorld

Emerging

7papers using it

2025first seen

The 'AppWorld' dataset/benchmark contains a collection of applications and their associated metadata, used to evaluate the performance of language model agents in learning from agentic traces in a parallel execution context.

🔎 Find this dataset

Papers using AppWorld (7)

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models2025

Toward Scalable Verifiable Reward: Proxy State-based Evaluation For Multi-turn Tool-calling LLM Agents2026 · 2 cites

Hera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM Agents2026

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents2026

ACON: Optimizing Context Compression for Long-horizon LLM Agents2025

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models2025

Reinforcement Learning for Self-Improving Agent with Skill Library2025