← all datasets

ProteinLMDataset

Emerging
2papers using it
40HF downloads
2HF likes
2024first seen

The ProteinLMDataset is a dataset containing 17.46 billion tokens of text and protein sequences used for self-supervised pretraining and supervised fine-tuning of models in protein engineering tasks.

Papers using ProteinLMDataset (2)

ProteinLMDataset β€” datasets β€” ai-for-science