← all datasets

The Stack

Canonical
7papers using it
20,071HF downloads
1,020HF likes
2023first seen

Dataset Card for The Stack Changelog Release Description v1.0 Initial release of the Stack. Included 30 programming languages and 18 permissive licenses. Note: Three included licenses (MPL/EPL/LGPL) are considered weak copyleft licenses. The resulting near-deduplicated dataset is 3TB in size. v1.1 The three copyleft li

Papers using The Stack (7)

The Stack β€” datasets β€” ai-for-code