Awesome Software Engineering
Software Engineering is one of the most active areas in Awesome AI for Code β 6,211 papers in this collection, evaluated on datasets like HumanEval, MBPP, SWE-bench. A strong starting point is "Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution".
Datasets & benchmarks
Key papers
- Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution (2026)Liliana Hotsko et al.12.88
- A Survey on Large Language Models for Code Generation (2024)Juyong Jiang et al.9.98
- How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks? (2023)Ying Zhang et al.8.58
- REPOT: Recoverable Program-of-Thought via Checkpoint Repair (2026)Parsa Mazaheri8.51
- CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation (2026)Weinan Dai et al.8.17
- LLMs in Software Security: A Survey of Vulnerability Detection
Techniques and Insights (2025)Ze Sheng et al.7.50
- An Empirical Study of Retrieval-Augmented Code Generation: Challenges
and Opportunities (2025)Zezhou Yang et al.7.29
- To Err is Machine: Vulnerability Detection Challenges LLM Reasoning (2024)Benjamin Steenhoek et al.7.16
- IRIS: LLM-Assisted Static Analysis for Detecting Security
Vulnerabilities (2024)Ziyang Li et al.7.00
- RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair (2023)Andr\'e Silva et al.6.86
- GPIoT: Tailoring Small Language Models for IoT Program Synthesis and
Development (2025)Leming Shen et al.6.47
- Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents (2026)Xiang Liu et al.6.46
- A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks (2025)Ronas Shakya et al.6.23
- AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code (2025)Lola Solovyeva et al.6.17
- Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications (2025)Yiming Zeng and Wanhao Yu and Zexin Li and Tao Ren and Yu Ma and Jinghan Cao and Xiyan Chen and Tingting Yu6.17
- Security Weaknesses of Copilot-Generated Code in GitHub Projects: An
Empirical Study (2023)Yujia Fu et al.6.13
- A Survey on Evaluating Large Language Models in Code Generation Tasks (2024)Liguo Chen et al.6.08
- PythonPal: Enhancing Online Programming Education through Chatbot-Driven
Personalized Feedback (2025)Sirinda Palahan5.96
- Benchmarking Prompt Engineering Techniques for Secure Code Generation
with GPT Models (2025)Marc Bruni et al.5.90
- Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization (2026)Anmol Agarwal et al.5.81
- SVA-ICL: Improving LLM-based Software Vulnerability Assessment via In-Context Learning and Information Fusion (2025)Chaoyang Gao et al.5.76
- Fully Autonomous Programming using Iterative Multi-Agent Debugging with
Large Language Models (2025)Anastasiia Grishina and Vadim Liventsev and Aki H\"arm\"a and Leon Moonen5.65
- Translating Regulatory Clauses into Executable Codes for Building Design Checking via Large Language Model Driven Function Matching and Composing (2023)Zhe Zheng et al.5.63
- The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot (2024)Doron Yeverechyahu et al.5.62
- COFFE: A Code Efficiency Benchmark for Code Generation (2025)Yun Peng et al.5.59
- LLM-Generated Microservice Implementations from RESTful API Definitions (2025)Saurabh Chauhan et al.5.59
- SecureFalcon: Are We There Yet in Automated Software Vulnerability
Detection with LLMs? (2023)Mohamed Amine Ferrag et al.5.58
- TOGLL: Correct and Strong Test Oracle Generation with LLMs (2024)Soneya Binta Hossain and Matthew Dwyer5.40
- TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment (2024)Zhiqiang Yuan et al.5.35
- DocAgent: A Multi-Agent System for Automated Code Documentation Generation (2025)Dayu Yang et al.5.35
- Advancing Code Coverage: Incorporating Program Analysis with Large Language Models (2024)Chen Yang et al.5.34
- Quality In, Quality Out: Investigating Training Data's Role in AI Code
Generation (2025)Cristina Improta et al.5.29
- Unified Modeling Language Code Generation from Diagram Images Using Multimodal Large Language Models (2025)Averi Bates et al.5.29
- What's Wrong with Your Code Generated by Large Language Models? An Extensive Study (2024)Shihan Dou et al.5.20
- C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques (2025)Vikram Nitin et al.5.18
- BitsAI-CR: Automated Code Review via LLM in Practice (2025)Tao Sun et al.5.18
- Alibaba LingmaAgent: Improving Automated Issue Resolution via
Comprehensive Repository Exploration (2024)Yingwei Ma and Qingping Yang and Rongyu Cao and Binhua Li and Fei Huang and Yongbin Li5.15
- On the Effectiveness of LLM-as-a-judge for Code Generation and Summarization (2025)Giuseppe Crupi et al.5.10
- A Closer Look into Transformer-Based Code Intelligence Through Code Transformation: Challenges and Opportunities (2022)Yaoxian Li et al.5.06
- AI-powered Code Review with LLMs: Early Results (2024)Zeeshan Rasheed et al.5.04
- KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for
Coding (2025)Zhangchen Xu et al.5.04
- A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages (2024)Sathvik Joel et al.5.02
- Willing but Unable: Separating Refusal from Capability in Code LLMs via Abliteration (2026)Cristina Carleo et al.5.01
- Mutation Without Variation: Convergence Dynamics in LLM-Driven Program Evolution (2026)Can Gurkan et al.5.01
- Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models (2026)Yuchen Chen et al.5.01
- Context-Based Adversarial Attacks on AI Code Generators: Vulnerability Analysis and Implications (2026)Walther A. Del Orbe et al.5.01
- Variational Prefix Tuning for Diverse and Accurate Code Summarization Using Pre-trained Language Models (2025)Junda Zhao et al.4.98
- Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs (2026)Alexander Sternfeld et al.4.95
- Validating Network Protocol Parsers with Traceable RFC Document
Interpretation (2025)Mingwei Zheng et al.4.93
- SpecGen: Automated Generation of Formal Program Specifications via Large
Language Models (2024)Lezhi Ma et al.4.87
- Enhancing Software Vulnerability Detection Using Code Property Graphs
and Convolutional Neural Networks (2025)Amanpreet Singh Saimbhi4.87
- StepGrade: Grading Programming Assignments with Context-Aware LLMs (2025)Mohammad Akyash et al.4.87
- Combining Language and App UI Analysis for the Automated Assessment of
Bug Reproduction Steps (2025)Junayed Mahmud et al.4.82
- Optimizing Datasets for Code Summarization: Is Code-Comment Coherence
Enough? (2025)Antonio Vitale et al.4.82
- Exploring Code Language Models for Automated HLS-based Hardware
Generation: Benchmark, Infrastructure and Analysis (2025)Jiahao Gai et al.4.82
- Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented
Pre-trained Language Models (2025)Quanjun Zhang et al.4.82
- Code Summarization Beyond Function Level (2025)Vladimir Makharev et al.4.82
- Should AI Optimize Your Code? A Comparative Study of Classical
Optimizing Compilers Versus Current Large Language Models (2024)Miguel Romero Rosas et al.4.79
- LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large
Language Models (2025)Mohamad Fakih et al.4.76
- SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub
Issue Resolution (2025)Chengxing Xie et al.4.76