← all datasets

enwik-8

Emerging
3papers using it
2026first seen

The 'enwik-8' dataset is a benchmark that contains a subset of the English Wikipedia, used to evaluate the performance of lossless compression algorithms.

Papers using enwik-8 (3)

enwik-8 β€” datasets β€” llm-papers