enwik-8
Emerging3papers using it
2026first seen
The 'enwik-8' dataset is a benchmark that contains a subset of the English Wikipedia, used to evaluate the performance of lossless compression algorithms.
Papers using enwik-8 (3)
- SSM Adapters via Hankel Reduced-order Modeling: Injection Site Determines Task Suitability in Long-Context Fine-TuningHarmonic: Hierarchical State Space Models for Efficient Long-Context Language ModelingNacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding