π Datasets β Awesome Federated Learning
341 datasets & benchmarks β 17 canonical foundations plus emerging datasets mined from recent papers. Each links to the papers that use it.
Dataset Card for BEIR Benchmark hotpotqa is one of the datasets from the Question Answering task within BEIR, measuring Wikipedia article retrieval for a given multi-hop query. Dataset Summary BEIR is a heterogeneous benchmark built from 18 diverse datasets representing 9 information retrieval tasks. Fact-checking: FEVER, Climate-FEVER, SciFact Question-Answering: NQ, HotpotQA, FiQA-2018 Bio-Medical IR: TREC-COVID, BioASQ, NFCorpus News Retrieval: TREC-NEWS, Robust04β¦ See the full description on the dataset page: https://huggingface.co/datasets/BeIR/hotpotqa.
AIME 2025 Dataset Dataset Description This dataset contains problems from the American Invitational Mathematics Examination (AIME) 2025-I & II.
Warning: The leaderboard above is generated by Hugging Face eval-results and may be incomplete until evaluation_framework: benchflow is accepted and deployed. The audited SkillsBench v1.1 result archive is https://huggingface.co/datasets/benchflow/skillsbench-leaderboard, with compact official exports under leaderboard/skillsbench/v1.1/. Warning: The dataset is a read-only mirror. The primary source for this benchmark is on GitHub: https://github.com/benchflow-ai/skillsbench. Open issues and⦠See the full description on the dataset page: https://huggingface.co/datasets/benchflow/skillsbench.
VBVR-Bench Re-hosted copy of Video-Reason/VBVR-Bench-Data, converted to standard HuggingFace parquet format. Splits in_domain: 50 tasks x 5 samples = 250 entries (tasks overlap with the VBVR training set). out_of_domain: 50 tasks x 5 samples = 250 entries (held-out reasoning tasks). Schema field type notes task_name string e.g. G-13_grid_number_sequence_data-generator video_idx string zero-padded sample id (00000..00004) domain string⦠See the full description on the dataset page: https://huggingface.co/datasets/pufanyi/VBVR-Bench.