← all datasets

BPE-tokenized OpenWebText

Emerging
1papers using it
2026first seen

'BPE-tokenized OpenWebText' is a dataset that contains text data processed using Byte Pair Encoding (BPE) for tokenization, and it is used to evaluate the performance of models on various language tasks.

BPE-tokenized OpenWebText β€” datasets β€” generative-models