T-maze
Emerging2papers using it
2025first seen
The T-Maze is a synthetic benchmark used to evaluate the ability of models to handle long-horizon decision-making tasks in partially observable environments, featuring corridors that can extend up to one million steps.