CommonCrawl
Emerging3papers using it
2022first seen
CommonCrawl is a dataset that contains a vast archive of web pages, used to evaluate the performance of large language models on tasks related to HTML understanding, such as parsing and automating web-based tasks.