Awesome Survey Paper
Survey Paper is one of the most active areas in Awesome AI for Code β 230 papers in this collection, evaluated on datasets like HumanEval, Stack Overflow, Spider. A strong starting point is "A Survey on Large Language Models for Code Generation".
Datasets & benchmarks
Key papers
- A Survey on Large Language Models for Code Generation (2024)Juyong Jiang et al.9.98
- LLMs in Software Security: A Survey of Vulnerability Detection
Techniques and Insights (2025)Ze Sheng et al.7.50
- A Survey on Evaluating Large Language Models in Code Generation Tasks (2024)Liguo Chen et al.6.08
- LLM-Generated Microservice Implementations from RESTful API Definitions (2025)Saurabh Chauhan et al.5.59
- From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security (2024)Enna Basic et al.5.52
- A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages (2024)Sathvik Joel et al.5.02
- A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond (2024)Qiushi Sun et al.4.63
- Learning Software Bug Reports: A Systematic Literature Review (2025)Guoming Long et al.4.58
- On the Challenges of Fuzzing Techniques via Large Language Models (2024)Linghan Huang et al.4.57
- Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications
of Agentic AI (2025)Ranjan Sapkota et al.4.53
- Assessing and Advancing Benchmarks for Evaluating Large Language Models in Software Engineering Tasks (2025)Xing Hu et al.4.47
- Source Code Summarization in the Era of Large Language Models (2024)Weisong Sun and Yun Miao and Yuekang Li and Hongyu Zhang and Chunrong Fang and Yi Liu and Gelei Deng and Yang Liu and Zhenyu Chen4.43
- Usability Analysis of Configurator User Interfaces with Multimodal Large Language Models (2026)Sebastian Lubos et al.4.33
- Requirements-Driven Automated Software Testing: A Systematic Review (2025)Fanyu Wang et al.4.30
- A Systematic Literature Review on Explainability for Machine/Deep
Learning-based Software Engineering Research (2024)Sicong Cao et al.4.10
- Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents (2026)Shuvendu K. Lahiri4.09
- Code as Agent Harness (2026)Xuying Ning et al.3.99
- Large Language Models for Code Generation: A Comprehensive Survey of
Challenges, Techniques, Evaluation, and Applications (2025)Nam Huynh and Beiyu Lin3.70
- How Are We Doing With Using AI-Based Programming Assistants For Privacy-Related Code Generation? The Developers' Experience (2025)Kashumi Madampe et al.3.70
- Towards Advancing Code Generation with Large Language Models: A Research
Roadmap (2025)Haolin Jin et al.3.59
- mHumanEval -- A Multilingual Benchmark to Evaluate Large Language Models for Code Generation (2024)Nishat Raihan et al.3.42
- A Large-Scale Study of Model Integration in ML-Enabled Software Systems (2024)Yorick Sens et al.3.31
- From Code Foundation Models to Agents and Applications: A Comprehensive Survey and Practical Guide to Code Intelligence (2025)Jian Yang et al.3.21
- A Survey on Code Generation with LLM-based Agents (2025)Yihong Dong et al.3.04
- A Deep Dive into Retrieval-Augmented Generation for Code Completion: Experience on WeChat (2025)Zezhou Yang et al.2.99
- A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications (2025)Boyang Yang et al.2.93
- Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities (2025)Yiming Huang et al.2.87
- Build Code Needs Maintenance Too: A Study on Refactoring and Technical
Debt in Build Systems (2025)Anwar Ghammam et al.2.82
- On Developers' Self-Declaration of AI-Generated Code: An Analysis of Practices (2025)Syed Mohammad Kashif et al.2.82
- Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality (2025)Roham Koohestani et al.2.76
- Large Language Models for Code Generation: The Practitioners Perspective (2025)Zeeshan Rasheed et al.2.65
- An Empirical Study on Challenges for LLM Application Developers (2024)Xiang Chen et al.2.37
- Prompting Techniques for Secure Code Generation: A Systematic
Investigation (2024)Catherine Tony et al.2.32
- Towards Automated Kernel Generation in the Era of LLMs (2026)Yang Yu et al.2.00
- Chatbot-Based Assessment of Code Understanding in Automated Programming Assessment Systems (2026)Eduard Frankford et al.1.89
- Engineering Students' Usage and Perceptions of GitHub Copilot in Open-Source Projects (2026)Neha Rani et al.1.89
- LLM-Enhanced Log Anomaly Detection: A Comprehensive Benchmark of Large Language Models for Automated System Diagnostics (2026)Disha Patel1.89
- Prompt-Driven Code Summarization: A Systematic Literature Review (2026)Afia Farjana et al.1.89
- LLM-Based Multi-Agent Systems for Code Generation: A Multi-Vocal Literature Review (2026)Zeeshan Rasheeda et al.1.89
- Sustainable Code Generation Using Large Language Models: A Systematic Literature Review (2026)Sabiya Banu Masthan Ali et al.1.83
- Human in the Loop for Fuzz Testing: Literature Review and the Road Ahead (2026)Jiongchi Yu and Xiaolin Wen and Sizhe Cheng and Xiaofei Xie and Qiang Hu and Yong Wang1.83
- Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence (2026)Vehid Geruslu et al.1.83
- A Survey of Code Review Benchmarks and Evaluation Practices in Pre-LLM and LLM Era (2026)Taufiqul Islam Khan et al.1.78
- Reporting LLM Prompting in Automated Software Engineering: A Guideline Based on Current Practices and Expectations (2026)Alexander Korn et al.1.72
- Large Language Model for Verilog Code Generation: Literature Review and the Road Ahead (2025)Guang Yang et al.1.67
- Large Language Models for Software Engineering: A Reproducibility Crisis (2025)Mohammed Latif Siddiq et al.1.67
- A Survey of Bugs in AI-Generated Code (2025)Ruofan Gao et al.1.67
- VeruSAGE: A Study of Agent-Based Verification for Rust Systems (2025)Chenyuan Yang et al.1.67
- Designing LLM-based Multi-Agent Systems for Software Engineering Tasks: Quality Attributes, Design Patterns and Rationale (2025)Yangxiao Cai et al.1.61
- Large Language Models for Unit Test Generation: Achievements, Challenges, and Opportunities (2025)Bei Chu et al.1.61
- Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches (2025)Yicheng Tao et al.1.56
- Prompting in Practice: Investigating Software Practitioners' Use of Generative AI Tools (2025)Daniel Otten et al.1.56
- Impact of LLMs on Team Collaboration in Software Development (2025)Devang Dhanuka1.56
- A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System (2025)Jiale Guo et al.1.56
- Generative AI and the Transformation of Software Development Practices (2025)Vivek Acharya1.56
- A Survey on Feedback Types in Automated Programming Assessment Systems (2025)Eduard Frankford et al.1.56
- Review of Tools for Zero-Code LLM Based Application Development (2025)Priyaranjan Pattnayak et al.1.56
- Does In-IDE Calibration of Large Language Models work at Scale? (2025)Roham Koohestani et al.1.56
- LLM-as-a-Judge for Software Engineering: Literature Review, Vision, and the Road Ahead (2025)Junda He et al.1.56
- Reflections on the Reproducibility of Commercial LLM Performance in Empirical Software Engineering Studies (2025)Florian Angermeir et al.1.56