← all datasets

ALFWorld

Canonical

99papers using it

2023first seen

ALFWorld is a dataset and benchmark designed to evaluate the ability of language agents to ground their actions in contextual information from their environment.

🔎 Find this dataset

Papers using ALFWorld (89)

SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning2026

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry2026

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning2026

TurnOPD: Making On-Policy Distillation Turn-Aware for Efficient Long-Horizon Agent Training2026

RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon Agents2025 · 35 cites

SKILLC: Learning Autonomous Skill Internalization in LLM Agents via Contrastive Credit Assignment2026 · 2 cites

Blueprint First, Model Second: A Framework for Deterministic LLM Workflow2025 · 6 cites

What and When to Distill: Selective Hindsight Distillation for Multi-Turn Agents2026

Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks2025

MetaSkill-Evolve: Recursive Self-Improvement of LLM Agents via Two-Timescale Meta-Skill Evolution2026

BiPACE: Bisimulation-Guided Policy Optimization with Action Counterfactual Estimation for LLM Agents2026

Graph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills2026 · 1 cites

KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents2024 · 4 cites

World Model Implanting for Test-time Adaptation of Embodied Agents2025 · 4 cites

LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents2026

Multi-Agent Transactive Memory2026

Beyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon Planning2025 · 3 cites

No Time Like the Present: Agentic Test-Time Training for LLM Agents2026

RSPO: Reward-Swap Policy Optimization for Multi-Turn LLM Agents2026

STAPO: Selective Trajectory-Aware Policy Optimization for LLM Agent Training2026

Task Decomposition-Guided Reranking for Adaptive Agent Skill Retrieval2026

Skill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference Learning2026

Unified Context Evolution for LLM Agents2026

SIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent Training2026

SkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at Scale2026

AdaMEM: Test-Time Adaptive Memory for Language Agents2026

Self-evolving LLM agents with in-distribution Optimization2026

3SPO: State-Score-Supervised Policy Optimization for LLM Agents2026

Organize then Retrieve: Hierarchical Memory Navigation for Efficient Agents2026

On-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn Agents2026

ACCORD: Action-Conditioned Contextual Grounding for Language Agents2026

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning2026

Uncertainty Decomposition for Clarification Seeking in LLM Agents2026

The Interplay of Harness Design and Post-Training in LLM Agents2026

Semantic Consistency Policy Optimization for Reinforcement Learning of LLM Agents2026

Joint Learning of Experiential Rules and Policies for Large Language Model Agents2026

ATOD: Annealed Turn-Aware On-Policy Distillation for Multi-Turn Agentic Tasks2026

DuoMem: Towards Capable On-Device Memory Agents via Dual-Space Distillation2026

Training Language Agents to Learn from Experience2026

Hera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM Agents2026

StepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement Learning2026

Skill-as-Pseudocode: Refactoring Skill Libraries to Pseudocode for LLM Agents2026

Honest Lying: Understanding Memory Confabulation in Reflexive Agents2026

SkillsInjector: Dynamic Skill Context Construction for LLM Agents2026

ExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM Agents2026

Skill Reuse as Compression in Agentic RL2026

Where LLM Agents Fail And How They Can Learn From Failures2025

Spinning Straw into Gold: Relabeling LLM Agent Trajectories in Hindsight for Successful Demonstrations2026

Progress- and Reliability-Oriented Group Policy Optimization for Agentic Reinforcement Learning2026

Retrospective Progress-Aware Self-Refinement for LLM Agent Training2026

PatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent Collaboration2026

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization2026

Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents2026

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents2026

When Denser Credit Is Not Enough: Evidence-Calibrated Policy Optimization for Long-Horizon LLM Agent Training2026

Selective Memory Retention for Long-Horizon LLM Agents2026

UCOB: Learning to Utilize and Evolve Agentic Skills via Credit-Aware On-Policy Bidirectional Self-Distillation2026

Self-Evolving World Models for LLM Agent Planning2026

Complete Cyclic Subtask Graphs For Tool-using LLM Agents: Flexibility, Cost, And Bottlenecks In Multi-agent Workflows2026

TAPE: Tool-guided Adaptive Planning And Constrained Execution In Language Model Agents2026

ReAct-Diffuse: An Integrated Agentic and Generative Diffusion Framework for Autonomous Multi-Step Task Reasoning and Execution2026

Grasp: Graph-structured Skill Compositions For LLM Agents2026

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning2026

Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents2026

Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents2026

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents2026

Ask Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong Agents2026

DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents2026

HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents2026

Dynamic Dual-Granularity Skill Bank for Agentic RL2026

Skillnet: Create, Evaluate, And Connect AI Skills2026

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents2026

Dynamic Mixed-Precision Routing for Efficient Multi-step LLM Interaction2026

Embodied Task Planning via Graph-Informed Action Generation with Large Language Models2026

ReflexGrad: Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing2025

PADME: Procedure Aware DynaMic Execution2025

NeSyPr: Neurosymbolic Proceduralization For Efficient Embodied Reasoning2025

Graph-Enhanced Policy Optimization in LLM Agent Training2025

Autocontext: Instance-level Context Learning For LLM Agents2025

Intrinsic Memory Agents: Heterogeneous Multi-agent LLM Systems Through Structured Contextual Memory2025

Fact-Augmented Lookahead Planning for LLM Agents2025

Unleashing Embodied Task Planning Ability in LLMs via Reinforcement Learning2025

Divide, Optimize, Merge: Fine-Grained LLM Agent Optimization at Scale2025

Structured Agent Distillation for Large Language Model2025

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning2025

DebFlow: Automating Agent Creation via Agent Debate2025

Better Than Your Teacher: LLM Agents That Learn From Privileged AI Feedback2024 · 1 cites

AdaPlanner: Adaptive Planning from Feedback with Language Models2023 · 13 cites

ADaPT: As-Needed Decomposition and Planning with Language Models2023 · 2 cites

ALFWorld dataset — papers, benchmarks & downloads · AI Agents