← all datasets

Wikipedia

Emerging
9papers using it
158,721HF downloads
1,250HF likes
2025first seen

Dataset Card for Wikimedia Wikipedia Dataset Summary Wikipedia dataset containing cleaned articles of all languages. The dataset is built from the Wikipedia dumps (https://dumps.wikimedia.org/) with one subset per language, each containing a single train split. Each example contains the content of one full Wikipedia ar

Papers using Wikipedia (9)

Wikipedia β€” datasets β€” llm-papers