ScienceWorld
Canonical18papers using it
2024first seen
ScienceWorld is a benchmark dataset used to evaluate the performance of LLM agents in skill orchestration and execution within structured environments.
Papers using ScienceWorld (18)
- RLVMR: Reinforcement Learning with Verifiable Meta-Reasoning Rewards for Robust Long-Horizon AgentsExploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy OptimizationAuto-Dreamer: Learning Offline Memory Consolidation for Language AgentsBlueprint First, Model Second: A Framework for Deterministic LLM WorkflowBeyond Policy Optimization: A Data Curation Flywheel for Sparse-Reward Long-Horizon PlanningCross-Environment Neural Reranking for Sample-Efficient Action Selection in Text-Based AgentsSelf-evolving LLM agents with in-distribution OptimizationOn-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn AgentsPolicy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language AgentsSkillnet: Create, Evaluate, And Connect AI SkillsGrasp: Graph-structured Skill Compositions For LLM AgentsHierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM AgentsFrom Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM AgentsPADME: Procedure Aware DynaMic ExecutionKnowMap: Efficient Knowledge-Driven Task Adaptation for LLMsUnleashing Embodied Task Planning Ability in LLMs via Reinforcement LearningDivide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement LearningOne STEP at a time: Language Agents are Stepwise Planners