The Stack v-2
Emerging5papers using it
2024first seen
The Stack v2 is a dataset used for code language modeling that contains a diverse collection of code from various programming languages, and it is utilized to evaluate the performance of large language models in generating and understanding code.
Papers using The Stack v-2 (5)
- Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training DatasetsWhen to Ponder: Adaptive Compute Allocation for Code Generation via Test-Time TrainingStarCoder 2 and The Stack v2: The Next GenerationEnhancing Cross-Language Code Translation via Task-Specific Embedding
Alignment in Retrieval-Augmented GenerationStarCoder 2 and The Stack v2: The Next Generation