AppWorld
Emerging7papers using it
2025first seen
The 'AppWorld' dataset/benchmark contains a collection of applications and their associated metadata, used to evaluate the performance of language model agents in learning from agentic traces in a parallel execution context.
Papers using AppWorld (7)
- Agentic Context Engineering: Evolving Contexts for Self-Improving Language ModelsToward Scalable Verifiable Reward: Proxy State-based Evaluation For Multi-turn Tool-calling LLM AgentsHera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM AgentsCombee: Scaling Prompt Learning for Self-Improving Language Model AgentsACON: Optimizing Context Compression for Long-horizon LLM AgentsAgentic Context Engineering: Evolving Contexts for Self-Improving
Language ModelsReinforcement Learning for Self-Improving Agent with Skill Library