WebShop
Canonical39papers using it
2023first seen
The 'WebShop' dataset/benchmark contains a simulated online shopping environment used to evaluate reinforcement learning algorithms in long-horizon agentic tasks.
Papers using WebShop (39)
- HarnessX: A Composable, Adaptive, and Evolvable Agent Harness FoundrySkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement LearningSkillAdaptor: Self-Adapting Skills for LLM Agents from TrajectoriesSkill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement LearningExploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy OptimizationWhat and When to Distill: Selective Hindsight Distillation for Multi-Turn AgentsBeyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon PlanningRetrieval-augmented Hierarchical In-context Reinforcement Learning And Hindsight Modular Reflections For Task Planning With LlmsCross-Environment Neural Reranking for Sample-Efficient Action Selection in Text-Based AgentsUnified Context Evolution for LLM AgentsSIRI: Self-Internalizing Reinforcement Learning with Intrinsic Skills for LLM Agent TrainingAdaMEM: Test-Time Adaptive Memory for Language AgentsFrom Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM AgentsSelf-evolving LLM agents with in-distribution Optimization3SPO: State-Score-Supervised Policy Optimization for LLM AgentsHERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-DistillationOn-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn AgentsHera: Learning Long-Horizon Coordination for Device-Cloud Collaborative LLM AgentsProper Scoring Rules for Agentic Uncertainty QuantificationSKILLC: Learning Autonomous Skill Internalization in LLM Agents via Contrastive Credit AssignmentWhere LLM Agents Fail And How They Can Learn From FailuresRetrospective Progress-Aware Self-Refinement for LLM Agent TrainingSKILL0: In-Context Agentic Reinforcement Learning for Skill InternalizationShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel ConflictsRewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon AgentsWhen Denser Credit Is Not Enough: Evidence-Calibrated Policy Optimization for Long-Horizon LLM Agent TrainingSkillnet: Create, Evaluate, And Connect AI SkillsGrasp: Graph-structured Skill Compositions For LLM AgentsEnvRL: Learn from Environment Dynamics in Agentic Reinforcement LearningOTora: A Unified Red Teaming Framework for Reasoning-Level Denial-of-Service in LLM AgentsHiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM AgentsDynamic Dual-Granularity Skill Bank for Agentic RLTSR: Trajectory-Search Rollouts for Multi-Turn RL of LLM AgentsMeta-RL Induces Exploration in Language AgentsGraph-Enhanced Policy Optimization in LLM Agent TrainingEnhancing Decision-Making of Large Language Models via Actor-CriticStructured Agent Distillation for Large Language ModelBetter Than Your Teacher: LLM Agents That Learn From Privileged AI FeedbackADaPT: As-Needed Decomposition and Planning with Language Models