WikiText

Name: WikiText
License: cc-by-sa-3.0

Canonical

3papers using it

1,327,290HF downloads

713HF likes

2025first seen

Dataset Card for "wikitext" Dataset Summary The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia. The dataset is available under the Creative Commons Attribution-ShareAlike License. Compared to the preprocessed versi

🤗 Hugging Face⚖ cc-by-sa-3.0

Papers using WikiText (3)

Long-Context Modeling via GSS-Transformer Hybrid Architecture with Learnable Mixing2026

Memorization Dynamics in Knowledge Distillation for Language Models2026

Tackling Distribution Shift in LLM via KILO: Knowledge-Instructed Learning for Continual Adaptation2025