← all datasets

MAS-Bench

Emerging
3papers using it
2025first seen

MAS-Bench is a benchmark designed to evaluate GUI-shortcut hybrid agents in the mobile domain, containing 139 complex tasks from 11 real-world applications, a knowledge base of 88 predefined shortcuts, and 9 evaluation metrics.

Papers using MAS-Bench (3)

MAS-Bench β€” datasets β€” ai-agents