RUT-Bench

Name: RUT-Bench
License: mit

Emerging

3papers using it

79HF downloads

0HF likes

2025first seen

Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions This repository contains the RUT-Bench benchmark, which consists of 1638 test samples for evaluating LLM agents under realistic user interactions. Paper: Beyond Ideal Instruction: A Comprehensive Framework for Evaluating L

🤗 Hugging Face⚖ mit

Papers using RUT-Bench (3)

Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions2026

CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation2025