Benchmarking World-model Learning
2025 Β· Archana Warrier, Dat Nguyen, Michelangelo Naim, et al.
Abstract
Model-learning agents should gather information to learn world models that support many downstream tasks and inferences, such as predicting unobserved states, estimating near- and far-term consequences of actions, planning action sequences, and detecting changes in dynamics. Current methods for learning and evaluating world models diverge from this goal: training and evaluation are anchored to next-frame prediction, and success is scored by reward maximization in the same environment. We propose WorldTest, a protocol to evaluate model-learning agents that separates reward-free interaction from a scored test phase in a different but related environment. WorldTest is open-ended \(\unicode\{x2014\}\) models should support many different tasks unknown ahead of time \(\unicode\{x2014\}\) and agnostic to model representation, allowing comparison across approaches. We instantiated WorldTest with AutumnBench, a suite of 43 interactive grid-world environments and 129 tasks across three families
Authors
(none)
Tags
Stats
Related papers
- Smallworlds: Assessing Dynamics Understanding Of World Models In Isolated Environments (2025)0.00
- Foundation World Models For Agents That Learn, Verify, And Adapt Reliably Beyond Static Environments (2026)0.00
- World Models As An Intermediary Between Agents And The Real World (2026)0.00
- Learning To Predict Without Looking Ahead: World Models Without Forward Prediction (2019)0.00
- The Effectiveness Of World Models For Continual Reinforcement Learning (2022)0.00
- Active World Model Learning With Progress Curiosity (2020)0.00
- Benchmark Environments For Multitask Learning In Continuous Domains (2017)0.00
- Smaller World Models For Reinforcement Learning (2020)0.00