RUT-Bench
Emerging3papers using it
79HF downloads
0HF likes
2025first seen
Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions This repository contains the RUT-Bench benchmark, which consists of 1638 test samples for evaluating LLM agents under realistic user interactions. Paper: Beyond Ideal Instruction: A Comprehensive Framework for Evaluating L
π€ Hugging Faceβ mit