← all datasets

WikiText

Canonical
3papers using it
1,327,290HF downloads
713HF likes
2025first seen

Dataset Card for "wikitext" Dataset Summary The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed versi

Papers using WikiText (3)

WikiText β€” datasets β€” llm-papers