PBench
Emerging5papers using it
2024first seen
PBench is a benchmark that evaluates the physical realism and action alignment of video-based world models for robotic manipulation, utilizing a curated dataset of three million manipulation clips with physics-aware annotation.
Papers using PBench (5)
- ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics AlignmentQwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video GenerationPerish or Flourish? A Holistic Evaluation of Large Language Models for Code Generation in Functional ProgrammingRefining Critical Thinking in LLM Code Generation: A Faulty Premise-based Evaluation FrameworkPyBench: Evaluating LLM Agent on various real-world coding tasks