Wikipedia
Emerging9papers using it
158,721HF downloads
1,250HF likes
2025first seen
Dataset Card for Wikimedia Wikipedia Dataset Summary Wikipedia dataset containing cleaned articles of all languages. The dataset is built from the Wikipedia dumps (https://dumps.wikimedia.org/) with one subset per language, each containing a single train split. Each example contains the content of one full Wikipedia ar
π€ Hugging Faceβ cc-by-sa-3.0
Papers using Wikipedia (9)
- Query-focused and Memory-aware Reranker for Long Context ProcessingWhat am I missing here?: Evaluating Large Language Models for Masked Sentence PredictionIs External Information Useful for Stance Detection with LLMs?Enhancing RAG Efficiency with Adaptive Context CompressionEvaluating the Retrieval Robustness of Large Language ModelsFrom Surveys to Narratives: Rethinking Cultural Value Adaptation in LLMsTiC-LM: A Web-Scale Benchmark for Time-Continual LLM PretrainingLLM Enhancer: Merged Approach using Vector Embedding for Reducing Large
Language Model Hallucinations with External KnowledgeRAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented
Instructions