OpenWebText
Emerging3papers using it
72,481HF downloads
520HF likes
2025first seen
Dataset Card for "openwebtext" Dataset Summary An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2. This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Data
π€ Hugging Faceβ cc0-1.0