ProteinLMDataset
Emerging2papers using it
40HF downloads
2HF likes
2024first seen
The ProteinLMDataset is a dataset containing 17.46 billion tokens of text and protein sequences used for self-supervised pretraining and supervised fine-tuning of models in protein engineering tasks.
π€ Hugging Faceβ mit