← all datasets

OpenWebText

Emerging
3papers using it
72,481HF downloads
520HF likes
2025first seen

Dataset Card for "openwebtext" Dataset Summary An open-source replication of the WebText dataset from OpenAI, that was used to train GPT-2. This distribution was created by Aaron Gokaslan and Vanya Cohen of Brown University. Supported Tasks and Leaderboards More Information Needed Languages More Information Needed Data

Papers using OpenWebText (3)

OpenWebText β€” datasets β€” llm-papers