Awesome AI for Code

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🎯Research Radar 🔖Saved

← all topics overview

Software Engineering

loading…

Stay Updated

E-Mail Digest 🎯 Research Radar

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Software Engineering — curated papers, datasets & benchmarks · Awesome AI for Code

← all topics overview

Awesome Software Engineering

Software Engineering is one of the most active areas in Awesome AI for Code — 6,390 papers in this collection, evaluated on datasets like HumanEval, MBPP, SWE-bench. A strong starting point is "Program-as-Weights: A Programming Paradigm for Fuzzy Functions".

Datasets & benchmarks

HumanEval203 papers

MBPP119 papers · 🤗

SWE-bench84 papers · 🤗

Defects4J83 papers

LiveCodeBench61 papers

SWE-bench Lite53 papers

Stack Overflow43 papers

SWT-Bench Verified39 papers

LeetCode39 papers

Python34 papers

SWEBench-Verified31 papers

Key papers

60 papers · trending (default)numbers = 🔥 heat

Program-as-Weights: A Programming Paradigm for Fuzzy Functions (2026)
Wentao Zhang et al.
13.10
Code2LoRA: Hypernetwork-Generated Adapters for Code Language Models under Software Evolution (2026)
Liliana Hotsko et al.
11.89
Tencent WorkBuddy Bench: A Multi-Domain Coding-Agent Benchmark with Contamination-Resistant Task Construction (2026)
Tencent WorkBuddy Bench Team et al.
11.28
A Survey on Large Language Models for Code Generation (2024)
Juyong Jiang et al.
9.98
AutoTrainess: Teaching Language Models to Improve Language Models Autonomously (2026)
Zhaojian Yu et al.
9.95
NVIDIA-labs OO Agents: Native Python Object-Oriented Agents (2026)
Paul Furgale et al.
9.89
How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks? (2023)
Ying Zhang et al.
8.58
REPOT: Recoverable Program-of-Thought via Checkpoint Repair (2026)
Parsa Mazaheri
8.40
OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment (2025)
Xiangjin Xie et al.
8.22
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation (2026)
Weinan Dai et al.
8.06
OpenForgeRL: Train Harness-native Agents in Any Environment (2026)
Xiao Yu et al.
7.92
LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights (2025)
Ze Sheng et al.
7.39
An Empirical Study of Retrieval-Augmented Code Generation: Challenges and Opportunities (2025)
Zezhou Yang et al.
7.18
To Err is Machine: Vulnerability Detection Challenges LLM Reasoning (2024)
Benjamin Steenhoek et al.
7.16
PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback (2025)
Kanika Goswami et al.
7.13
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities (2024)
Ziyang Li et al.
7.00
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair (2023)
Andr\'e Silva et al.
6.86
LlamaRestTest: Effective REST API Testing with Small Language Models (2025)
Myeongsoo Kim et al.
6.67
BugWhisperer: Fine-Tuning LLMs for SoC Hardware Vulnerability Detection (2025)
Shams Tarek et al.
6.47
GPIoT: Tailoring Small Language Models for IoT Program Synthesis and Development (2025)
Leming Shen et al.
6.36
Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents (2026)
Xiang Liu et al.
6.35
AI-Powered, But Power-Hungry? Energy Efficiency of LLM-Generated Code (2025)
Lola Solovyeva et al.
6.30
Code Summarization Beyond Function Level (2025)
Vladimir Makharev et al.
6.30
SVA-ICL: Improving LLM-based Software Vulnerability Assessment via In-Context Learning and Information Fusion (2025)
Chaoyang Gao et al.
6.23
Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study (2023)
Yujia Fu et al.
6.13
A Showdown of ChatGPT vs DeepSeek in Solving Programming Tasks (2025)
Ronas Shakya et al.
6.12
A Survey on Evaluating Large Language Models in Code Generation Tasks (2024)
Liguo Chen et al.
6.08
LLM Agents Making Agent Tools (2025)
Georg W\"olflein et al.
6.06
Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications (2025)
Yiming Zeng and Wanhao Yu and Zexin Li and Tao Ren and Yu Ma and Jinghan Cao and Xiyan Chen and Tingting Yu
6.06
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing (2024)
Asmita et al.
6.02
Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation (2025)
Cristina Improta et al.
5.84
PythonPal: Enhancing Online Programming Education through Chatbot-Driven Personalized Feedback (2025)
Sirinda Palahan
5.84
Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models (2025)
Marc Bruni et al.
5.79
DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs (2025)
Mohammad Akyash et al.
5.76
Learning Software Bug Reports: A Systematic Literature Review (2025)
Guoming Long et al.
5.76
DigitalCoach: Communication and Grounding Gaps in Human and Agentic Computer Use Coaching (2026)
Meng Chen et al.
5.76
Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization (2026)
Anmol Agarwal et al.
5.70
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding (2025)
Zhangchen Xu et al.
5.70
Guiding LLM-based Smart Contract Generation with Finite State Machine (2025)
Hao Luo et al.
5.65
Translating Regulatory Clauses into Executable Codes for Building Design Checking via Large Language Model Driven Function Matching and Composing (2023)
Zhe Zheng et al.
5.63
The Impact of Large Language Models on Open-source Innovation: Evidence from GitHub Copilot (2024)
Doron Yeverechyahu et al.
5.62
Do Prompt Patterns Affect Code Quality? A First Empirical Assessment of ChatGPT-Generated Code (2025)
Antonio Della Porta et al.
5.59
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3 (2025)
Ahmed R. Sadik et al.
5.59
Automatically Generating Rules of Malicious Software Packages via Large Language Model (2025)
XiangRui Zhang et al.
5.59
SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs? (2023)
Mohamed Amine Ferrag et al.
5.58
Fully Autonomous Programming using Iterative Multi-Agent Debugging with Large Language Models (2025)
Anastasiia Grishina and Vadim Liventsev and Aki H\"arm\"a and Leon Moonen
5.54
StepGrade: Grading Programming Assignments with Context-Aware LLMs (2025)
Mohammad Akyash et al.
5.54
COFFE: A Code Efficiency Benchmark for Code Generation (2025)
Yun Peng et al.
5.48
LLM-Generated Microservice Implementations from RESTful API Definitions (2025)
Saurabh Chauhan et al.
5.48
TOGLL: Correct and Strong Test Oracle Generation with LLMs (2024)
Soneya Binta Hossain and Matthew Dwyer
5.40
TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment (2024)
Zhiqiang Yuan et al.
5.35
Advancing Code Coverage: Incorporating Program Analysis with Large Language Models (2024)
Chen Yang et al.
5.34
DocAgent: A Multi-Agent System for Automated Code Documentation Generation (2025)
Dayu Yang et al.
5.24
Leveraging LLM to Strengthen ML-Based Cross-Site Scripting Detection (2025)
Dennis Miczek et al.
5.24
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study (2024)
Shihan Dou et al.
5.20
Human-Human-AI Triadic Programming: Uncovering the Role of AI Agent and the Value of Human Partner in Collaborative Learning (2026)
Taufiq Daryanto et al.
5.19
Agentic Much? Adoption of Coding Agents on GitHub (2026)
Romain Robbes et al.
5.19
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models (2024)
Chenyuan Yang et al.
5.18
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding (2025)
Zhangchen Xu et al.
5.18
ORANSight-2.0: Foundational LLMs for O-RAN (2025)
Pranshav Gajjar et al.
5.18