ALFWorld
Canonical77papers using it
2023first seen
ALFWorld is a benchmark dataset that contains interactive environments designed to evaluate reinforcement learning agents on long-horizon tasks with sparse rewards.
Papers using ALFWorld (77)
- HarnessX: A Composable, Adaptive, and Evolvable Agent Harness FoundrySkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement LearningSkill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement LearningRLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon AgentsGFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with
Generative Flow NetworksAutoplan: Automatic Planning Of Interactive Decision-making Tasks With Large Language ModelsWhat and When to Distill: Selective Hindsight Distillation for Multi-Turn AgentsSelf-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making TasksBlueprint First, Model Second: A Framework for Deterministic LLM WorkflowGraph-of-Skills: Dependency-Aware Structural Retrieval for Massive Agent SkillsKnowAgent: Knowledge-Augmented Planning for LLM-Based AgentsWorld Model Implanting for Test-time Adaptation of Embodied AgentsLatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM AgentsBeyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon PlanningRetrieval-augmented Hierarchical In-context Reinforcement Learning And Hindsight Modular Reflections For Task Planning With LlmsSkill or Skip? Learning Selective Skill Invocation in Agentic Tasks via Dual-Granularity Preference LearningCross-Environment Neural Reranking for Sample-Efficient Action Selection in Text-Based AgentsUnified Context Evolution for LLM AgentsSIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent TrainingSkillDAG: Self-Evolving Typed Skill Graphs for LLM Skill Selection at ScaleAdaMEM: Test-Time Adaptive Memory for Language AgentsSelf-evolving LLM agents with in-distribution Optimization3SPO: State-Score-Supervised Policy Optimization for LLM AgentsOrganize then Retrieve: Hierarchical Memory Navigation for Efficient AgentsOn-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn AgentsACCORD: Action-Conditioned Contextual Grounding for Language AgentsTraining Language Agents to Learn from ExperienceHera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM AgentsStepOPSD: Step-Aware Online Preference Distillation for Agent Reinforcement LearningSKILLC: Learning Autonomous Skill Internalization in LLM Agents via Contrastive Credit AssignmentSkill-as-Pseudocode: Refactoring Skill Libraries to Pseudocode for LLM AgentsHonest Lying: Understanding Memory Confabulation in Reflexive AgentsSkillsInjector: Dynamic Skill Context Construction for LLM AgentsExpGraph: Model-Agnostic Experience Learning with Graph-Structured Memory for LLM AgentsSkill Reuse as Compression in Agentic RLWhere LLM Agents Fail And How They Can Learn From FailuresRetrospective Progress-Aware Self-Refinement for LLM Agent TrainingPatchBoard: Schema-Grounded State Mutation for Reliable and Auditable LLM Multi-Agent CollaborationSKILL0: In-Context Agentic Reinforcement Learning for Skill InternalizationDynamic Skill Lifecycle Management for Agentic Reinforcement LearningSkills on the Fly: Test-Time Adaptive Skill Synthesis for LLM AgentsRewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon AgentsPolicy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language AgentsWhen Denser Credit Is Not Enough: Evidence-Calibrated Policy Optimization for Long-Horizon LLM Agent TrainingComplete Cyclic Subtask Graphs For Tool-using LLM Agents: Flexibility, Cost, And Bottlenecks In Multi-agent WorkflowsTAPE: Tool-guided Adaptive Planning And Constrained Execution In Language Model AgentsSkillnet: Create, Evaluate, And Connect AI SkillsReAct-Diffuse: An Integrated Agentic and Generative Diffusion Framework for Autonomous Multi-Step Task Reasoning and ExecutionGrasp: Graph-structured Skill Compositions For LLM AgentsEnvRL: Learn from Environment Dynamics in Agentic Reinforcement LearningPRISM: Perception Reasoning Interleaved for Sequential Decision MakingHierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM AgentsFrom Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM AgentsAsk Only When Needed: Proactive Retrieval from Memory and Skills for Experience-Driven Lifelong AgentsHiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM AgentsDynamic Dual-Granularity Skill Bank for Agentic RLMemSkill: Learning and Evolving Memory Skills for Self-Evolving AgentsDynamic Mix Precision Routing for Efficient Multi-step LLM InteractionEmbodied Task Planning via Graph-Informed Action Generation with Large Language ModelReflexGrad: Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process RoutingPADME: Procedure Aware DynaMic ExecutionNeSyPr: Neurosymbolic Proceduralization For Efficient Embodied ReasoningGraph-Enhanced Policy Optimization in LLM Agent TrainingFCRF: Flexible Constructivism Reflection for Long-Horizon Robotic Task Planning with Large Language ModelsEnhancing Decision-Making of Large Language Models via Actor-CriticImproving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead SearchUnleashing Embodied Task Planning Ability in LLMs via Reinforcement LearningAutocontext: Instance-level Context Learning For LLM AgentsIntrinsic Memory Agents: Heterogeneous Multi-agent LLM Systems Through Structured Contextual MemoryDivide, Optimize, Merge: Fine-Grained LLM Agent Optimization at ScaleStructured Agent Distillation for Large Language ModelDivide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement LearningDebFlow: Automating Agent Creation via Agent DebateBetter Than Your Teacher: LLM Agents That Learn From Privileged AI FeedbackAdaPlanner: Adaptive Planning from Feedback with Language ModelsAutoPlan: Automatic Planning of Interactive Decision-Making Tasks With
Large Language ModelsADaPT: As-Needed Decomposition and Planning with Language Models