IFEval
Canonical12papers using it
93,921HF downloads
151HF likes
2024first seen
Dataset Card for IFEval Dataset Summary This dataset contains the prompts used in the Instruction-Following Eval (IFEval) benchmark for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by
π€ Hugging Faceβ apache-2.0
Papers using IFEval (12)
- PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward ModelingThe Price of Format: Diversity Collapse in LLMsBoosting Large Language Models with Mask Fine-TuningRevisiting the Reliability of Language Models in Instruction-FollowingMarco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language ModelsSample, Don't Search: Rethinking Test-Time Alignment for Language ModelsMM-IFEngine: Towards Multimodal Instruction FollowingLLaDA 1.5: Variance-Reduced Preference Optimization for Large Language
Diffusion ModelsIFDECORATOR: Wrapping Instruction Following Reinforcement Learning with
Verifiable RewardsEffectively Controlling Reasoning Models through Thinking InterventionM-IFEval: Multilingual Instruction-Following EvaluationMulti-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions
Following