← all datasets

CommonCrawl

Emerging
3papers using it
2022first seen

CommonCrawl is a dataset that contains a vast archive of web pages, used to evaluate the performance of large language models on tasks related to HTML understanding, such as parsing and automating web-based tasks.

Papers using CommonCrawl (3)

CommonCrawl β€” datasets β€” ai-for-code