Wikipedia

Name: Wikipedia
License: cc-by-sa-3.0

Emerging

9papers using it

158,721HF downloads

1,250HF likes

2025first seen

Dataset Card for Wikimedia Wikipedia Dataset Summary Wikipedia dataset containing cleaned articles of all languages. The dataset is built from the Wikipedia dumps (https://dumps.wikimedia.org/) with one subset per language, each containing a single train split. Each example contains the content of one full Wikipedia ar

🤗 Hugging Face⚖ cc-by-sa-3.0

Papers using Wikipedia (9)

Query-focused and Memory-aware Reranker for Long Context Processing2026

What am I missing here?: Evaluating Large Language Models for Masked Sentence Prediction2025

Is External Information Useful for Stance Detection with LLMs?2025

Enhancing RAG Efficiency with Adaptive Context Compression2025

Evaluating the Retrieval Robustness of Large Language Models2025

From Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMs2025

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining2025

LLM Enhancer: Merged Approach using Vector Embedding for Reducing Large Language Model Hallucinations with External Knowledge2025

RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions2025