← all datasets

Python benchmarks

Emerging

43papers using it

2018first seen

🔎 Find this dataset

Papers using Python benchmarks (39)

Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V32025 · 3 cites

ASTER: Natural and Multi-language Unit Test Generation with LLMs2024 · 2 cites

Unify and Triumph: Polyglot, Diverse, and Self-Consistent Generation of Unit Tests with LLMs2025 · 1 cites

Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code2025 · 1 cites

Babbling Suppression: Making LLMs Greener One Token at a Time2026

A framework for assessing the capabilities of code generation of constraint domain-specific languages with large language models2026

ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design2026

Neuron-Guided Interpretation of Code LLMs: Where, Why, and How?2025

BRIDGE: Building Representations In Domain Guided Program Synthesis2025

GramTrans: A Better Code Representation Approach in Code Generation2025

Challenge on Optimization of Context Collection for Code Completion2025

Evaluating Large Language Models for Code Translation: Effects of Prompt Language and Prompt Design2025

When Retriever Meets Generator: A Joint Model for Code Comment Generation2025

Exploring Generalizable Automated Program Repair with Large Language Models2025

Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection2025

Understanding the Effectiveness of LLMs in Automated Self-Admitted Technical Debt Repayment2025

A General Path-Based Representation for Predicting Program Properties2018 · 136 cites

Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code2020 · 76 cites

Unsupervised Translation of Programming Languages2020 · 62 cites

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning2020 · 52 cites

I Know What You Are Searching For: Code Snippet Recommendation from Stack Overflow Posts2022 · 30 cites

Exploiting Method Names to Improve Code Summarization: A Deliberation Multi-Task Learning Approach2021 · 24 cites

Syntax and Domain Aware Model for Unsupervised Program Translation2023 · 24 cites

Code Execution with Pre-trained Language Models2023 · 19 cites

PromSec: Prompt Optimization for Secure Generation of Functional Source Code with Large Language Models (LLMs)2024 · 16 cites

MMF3: Neural Code Summarization Based on Multi-Modal Fine-Grained Feature Fusion2022 · 10 cites

A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama2024 · 8 cites

CoTran: An LLM-based Code Translator using Reinforcement Learning with Feedback from Compiler and Symbolic Execution2023 · 4 cites

Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning2024 · 4 cites

Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches2020 · 3 cites

COSEA: Convolutional Code Search with Layer-wise Attention2020 · 3 cites

CodePlan: Repository-level Coding using LLMs and Planning2023 · 3 cites

SynCode: LLM Generation with Grammar Augmentation2024 · 3 cites

CodeFusion: A Pre-trained Diffusion Model for Code Generation2023 · 2 cites

CodeShell Technical Report2024 · 2 cites

Generating Adversarial Computer Programs using Optimized Obfuscations2021 · 1 cites

Automated Transpilation of Imperative to Functional Code using Neural-Guided Program Synthesis (Extended Version)2022 · 1 cites

TASTY: A Transformer based Approach to Space and Time complexity2023 · 1 cites

Precision or Peril: A PoC of Python Code Quality from Quantized Large Language Models2024

Python benchmarks — datasets — ai-for-code