← all datasets

FineWeb-Edu

Emerging
4papers using it
462,571HF downloads
1,150HF likes
2025first seen

πŸ“š FineWeb-Edu 1.3 trillion tokens of the finest educational data the 🌐 web has to offer Paper: https://arxiv.org/abs/2406.17557 What is it? πŸ“š FineWeb-Edu dataset consists of 1.3T tokens and 5.4T tokens (FineWeb-Edu-score-2) of educational web pages filtered from 🍷 FineWeb dataset. This is the 1.3 trillion version.

Papers using FineWeb-Edu (4)

FineWeb-Edu β€” datasets β€” llm-papers