Awesome Code Generation
Code Generation is one of the most active areas in Awesome AI for Code β 4,404 papers in this collection, evaluated on datasets like HumanEval, MBPP, Spider. A strong starting point is "A Survey on Large Language Models for Code Generation".
Datasets & benchmarks
Key papers
- A Survey on Large Language Models for Code Generation (2024)Juyong Jiang et al.9.98
- How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks? (2023)Ying Zhang et al.8.58
- REPOT: Recoverable Program-of-Thought via Checkpoint Repair (2026)Parsa Mazaheri8.51
- CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation (2026)Weinan Dai et al.8.17
- An Empirical Study of Retrieval-Augmented Code Generation: Challenges
and Opportunities (2025)Zezhou Yang et al.7.29
- PlotGen: Multi-Agent LLM-based Scientific Data Visualization via
Multimodal Feedback (2025)Kanika Goswami et al.7.06
- Spec2RTL-Agent: Automated Hardware Code Generation from Complex
Specifications Using LLM Agent Systems (2025)Zhongzhi Yu et al.7.06
- IRIS: LLM-Assisted Static Analysis for Detecting Security
Vulnerabilities (2024)Ziyang Li et al.7.00
- RTL++: Graph-enhanced LLM for RTL Code Generation (2025)Mohammad Akyash et al.6.58
- GPIoT: Tailoring Small Language Models for IoT Program Synthesis and
Development (2025)Leming Shen et al.6.47
- A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks (2025)Ronas Shakya et al.6.23
- AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code (2025)Lola Solovyeva et al.6.17
- Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications (2025)Yiming Zeng and Wanhao Yu and Zexin Li and Tao Ren and Yu Ma and Jinghan Cao and Xiyan Chen and Tingting Yu6.17
- Security Weaknesses of Copilot-Generated Code in GitHub Projects: An
Empirical Study (2023)Yujia Fu et al.6.13
- A Survey on Evaluating Large Language Models in Code Generation Tasks (2024)Liguo Chen et al.6.08
- VRank: Enhancing Verilog Code Generation from Large Language Models via
Self-Consistency (2025)Zhuorui Zhao et al.5.90
- Benchmarking Prompt Engineering Techniques for Secure Code Generation
with GPT Models (2025)Marc Bruni et al.5.90
- Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization (2026)Anmol Agarwal et al.5.81
- Fully Autonomous Programming using Iterative Multi-Agent Debugging with
Large Language Models (2025)Anastasiia Grishina and Vadim Liventsev and Aki H\"arm\"a and Leon Moonen5.65
- Translating Regulatory Clauses into Executable Codes for Building Design Checking via Large Language Model Driven Function Matching and Composing (2023)Zhe Zheng et al.5.63
- The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot (2024)Doron Yeverechyahu et al.5.62
- COFFE: A Code Efficiency Benchmark for Code Generation (2025)Yun Peng et al.5.59
- LLM-Generated Microservice Implementations from RESTful API Definitions (2025)Saurabh Chauhan et al.5.59
- From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security (2024)Enna Basic et al.5.52
- CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging (2025)Md. Ashraful Islam et al.5.46
- Scaling Text-Rich Image Understanding via Code-Guided Synthetic
Multimodal Data Generation (2025)Yue Yang et al.5.46
- TOGLL: Correct and Strong Test Oracle Generation with LLMs (2024)Soneya Binta Hossain and Matthew Dwyer5.40
- TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment (2024)Zhiqiang Yuan et al.5.35
- DocAgent: A Multi-Agent System for Automated Code Documentation Generation (2025)Dayu Yang et al.5.35
- Quality In, Quality Out: Investigating Training Data's Role in AI Code
Generation (2025)Cristina Improta et al.5.29
- Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models (2025)Averi Bates et al.5.29
- Competitive Programming with Large Reasoning Models (2025)OpenAI: Ahmed El-Kishky et al.5.24
- What's Wrong with Your Code Generated by Large Language Models? An Extensive Study (2024)Shihan Dou et al.5.20
- KernelGPT: Enhanced Kernel Fuzzing via Large Language Models (2024)Chenyuan Yang et al.5.18
- C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques (2025)Vikram Nitin et al.5.18
- BitsAI-CR: Automated Code Review via LLM in Practice (2025)Tao Sun et al.5.18
- On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization (2025)Giuseppe Crupi et al.5.10
- Towards Translating Real-World Code with LLMs: A Study of Translating to
Rust (2024)Hasan Ferit Eniser et al.5.09
- A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities (2022)Yaoxian Li et al.5.06
- KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for
Coding (2025)Zhangchen Xu et al.5.04
- CodeContests+: High-Quality Test Case Generation for Competitive
Programming (2025)Zihan Wang et al.5.04
- A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages (2024)Sathvik Joel et al.5.02
- Willing but Unable: Separating Refusal from Capability in Code LLMs via Abliteration (2026)Cristina Carleo et al.5.01
- Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution (2026)Can Gurkan et al.5.01
- Context-Based Adversarial Attacks on AI Code Generators: Vulnerability Analysis and Implications (2026)Walther A. Del Orbe et al.5.01
- Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs (2026)Alexander Sternfeld et al.4.95
- SpecGen: Automated Generation of Formal Program Specifications via Large
Language Models (2024)Lezhi Ma et al.4.87
- CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization (2024)Yang Zhao et al.4.85
- Optimizing Datasets for Code Summarization: Is Code-Comment Coherence
Enough? (2025)Antonio Vitale et al.4.82
- Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented
Pre-trained Language Models (2025)Quanjun Zhang et al.4.82
- Code Summarization Beyond Function Level (2025)Vladimir Makharev et al.4.82
- Should AI Optimize Your Code? A Comparative Study of Classical
Optimizing Compilers Versus Current Large Language Models (2024)Miguel Romero Rosas et al.4.79
- SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub
Issue Resolution (2025)Chengxing Xie et al.4.76
- Test Wars: A Comparative Study of SBST, Symbolic Execution, and
LLM-Based Approaches to Unit Test Generation (2025)Azat Abdullin et al.4.76
- ALMAS: an Autonomous LLM-based Multi-Agent Software Engineering Framework (2025)Vali Tawosi et al.4.75
- Are Large Language Models Memorizing Bug Benchmarks? (2024)Daniel Ramos et al.4.65
- ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts (2024)Lyuye Zhang and Kaixuan Li and Kairan Sun and Daoyuan Wu and Ye Liu and Haoye Tian and Yang Liu4.63
- A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond (2024)Qiushi Sun et al.4.63
- Decoding Secret Memorization in Code LLMs Through Token-Level
Characterization (2024)Yuqing Nie et al.4.60
- GeoJSEval: An Automated Evaluation Framework for Large Language Models on JavaScript-Based Geospatial Computation and Visualization Code Generation (2025)Guanyu Chen et al.4.58