Awesome AI for Code

📄Papers 🧭Topics 🔥Trending 🗺️Map 🏆Leaderboards 🎓Learn 🤖Ask AI

⋯More

👥Authors 📚Reading Packs 📊Datasets 🛠️Tools 📰News 📝Blogs ✉️Newsletter 🎯Research Radar 🔖Saved

← all topics overview

Bug Detection

loading…

Stay Updated

E-Mail Digest 🎯 Research Radar

Submit a paper · Privacy · Terms

© 2026 Awesome Papers.

Awesome Bug Detection — curated papers, datasets & benchmarks · Awesome AI for Code

← all topics overview

Awesome Bug Detection

Bug Detection is one of the most active areas in Awesome AI for Code — 1,858 papers in this collection, evaluated on datasets like Defects4J, HumanEval, SWE-bench. A strong starting point is "How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks?".

Datasets & benchmarks

Defects4J40 papers

HumanEval33 papers

SWE-bench23 papers · 🤗

SWE-bench Lite18 papers

MBPP16 papers · 🤗

LiveCodeBench15 papers

BigVul14 papers

TensorFlow10 papers

Linux kernel9 papers

PyTorch9 papers

SWEBench-Verified9 papers

BigCloneBench8 papers

Key papers

60 papers · trending (default)numbers = 🔥 heat

How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks? (2023)
Ying Zhang et al.
8.58
LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights (2025)
Ze Sheng et al.
7.39
To Err is Machine: Vulnerability Detection Challenges LLM Reasoning (2024)
Benjamin Steenhoek et al.
7.16
IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities (2024)
Ziyang Li et al.
7.00
BugWhisperer: Fine-Tuning LLMs for SoC Hardware Vulnerability Detection (2025)
Shams Tarek et al.
6.47
Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents (2026)
Xiang Liu et al.
6.35
SVA-ICL: Improving LLM-based Software Vulnerability Assessment via In-Context Learning and Information Fusion (2025)
Chaoyang Gao et al.
6.23
Security Weaknesses of Copilot-Generated Code in GitHub Projects: An Empirical Study (2023)
Yujia Fu et al.
6.13
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing (2024)
Asmita et al.
6.02
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging (2025)
Md. Ashraful Islam et al.
6.01
Quality In, Quality Out: Investigating Training Data's Role in AI Code Generation (2025)
Cristina Improta et al.
5.84
PythonPal: Enhancing Online Programming Education through Chatbot-Driven Personalized Feedback (2025)
Sirinda Palahan
5.84
Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models (2025)
Marc Bruni et al.
5.79
Learning Software Bug Reports: A Systematic Literature Review (2025)
Guoming Long et al.
5.76
Guiding LLM-based Smart Contract Generation with Finite State Machine (2025)
Hao Luo et al.
5.65
Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs DeepSeek-V3 (2025)
Ahmed R. Sadik et al.
5.59
Automatically Generating Rules of Malicious Software Packages via Large Language Model (2025)
XiangRui Zhang et al.
5.59
SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs? (2023)
Mohamed Amine Ferrag et al.
5.58
From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security (2024)
Enna Basic et al.
5.52
TOGLL: Correct and Strong Test Oracle Generation with LLMs (2024)
Soneya Binta Hossain and Matthew Dwyer
5.40
Leveraging LLM to Strengthen ML-Based Cross-Site Scripting Detection (2025)
Dennis Miczek et al.
5.24
What's Wrong with Your Code Generated by Large Language Models? An Extensive Study (2024)
Shihan Dou et al.
5.20
KernelGPT: Enhanced Kernel Fuzzing via Large Language Models (2024)
Chenyuan Yang et al.
5.18
Evaluating LLaMA 3.2 for Software Vulnerability Detection (2025)
Jos\'e Gon\c{c}alves et al.
5.18
Combining Large Language Models with Static Analyzers for Code Review Generation (2025)
Imen Jaoua et al.
5.13
SmartLLM: Smart Contract Auditing using Custom Generative AI (2025)
Jun Kevin and Pujianto Yugopuspito
5.13
Correctness Assessment of Code Generated by Large Language Models Using Internal Representations (2025)
Tuan-Dung Bui et al.
5.07
BitsAI-CR: Automated Code Review via LLM in Practice (2025)
Tao Sun et al.
5.07
AI-powered Code Review with LLMs: Early Results (2024)
Zeeshan Rasheed et al.
5.04
UCSC NLP at SemEval-2026 Task 10: Boundary-Aware Span Extraction and RoBERTa Classification for Conspiracy Detection (2026)
Dom Marhoefer et al.
4.95
thaulab@EEUCA 2026: Who Said What to Whom? A Targeting-Aware Neural-Symbolic Pipeline for Gaming Toxicity Detection (2026)
Anmol Guragain et al.
4.95
Willing but Unable: Separating Refusal from Capability in Code LLMs via Abliteration (2026)
Cristina Carleo et al.
4.90
Context-Based Adversarial Attacks on AI Code Generators: Vulnerability Analysis and Implications (2026)
Walther A. Del Orbe et al.
4.90
ICVul: A Well-labeled C/C++ Vulnerability Dataset with Comprehensive Metadata and VCCs (2025)
Chaomeng Lu et al.
4.87
Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs (2026)
Alexander Sternfeld et al.
4.84
On Benchmarking Code LLMs for Android Malware Analysis (2025)
Yiling He et al.
4.82
Validating Network Protocol Parsers with Traceable RFC Document Interpretation (2025)
Mingwei Zheng et al.
4.82
Enhancing Software Vulnerability Detection Using Code Property Graphs and Convolutional Neural Networks (2025)
Amanpreet Singh Saimbhi
4.76
AsserT5: Test Assertion Generation Using a Fine-Tuned Code Language Model (2025)
Severin Primbs et al.
4.71
Combining Language and App UI Analysis for the Automated Assessment of Bug Reproduction Steps (2025)
Junayed Mahmud et al.
4.71
Are Large Language Models Memorizing Bug Benchmarks? (2024)
Daniel Ramos et al.
4.65
Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets (2025)
Mahmoud Jahanshahi et al.
4.65
LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models (2025)
Mohamad Fakih et al.
4.65
Evaluating Agent-based Program Repair at Google (2025)
Pat Rondon et al.
4.65
Augmenting Smart Contract Decompiler Output through Fine-grained Dependency Analysis and LLM-facilitated Semantic Recovery (2025)
Zeqin Liao et al.
4.65
Code Change Intention, Development Artifact and History Vulnerability: Putting Them Together for Vulnerability Fix Detection by LLM (2025)
Xu Yang et al.
4.65
ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts (2024)
Lyuye Zhang and Kaixuan Li and Kairan Sun and Daoyuan Wu and Ye Liu and Haoye Tian and Yang Liu
4.63
Human-Written vs. AI-Generated Code: A Large-Scale Study of Defects, Vulnerabilities, and Complexity (2025)
Domenico Cotroneo et al.
4.53
A Preliminary Study of Large Language Models for Multilingual Vulnerability Detection (2025)
Junji Yu et al.
4.36
Refploit: Facilitating Exploit Construction via Code-Agent Trajectory Repair (2026)
Zirui Chen et al.
4.33
Regression Accumulation in Multi-Turn LLM Programming Conversations (2026)
Yonghui (Andie) et al.
4.33
An Exploratory Study on LLM-Generated Code and Comments in Code Repositories (2026)
Yongyi Ji et al.
4.33
Prompt Coverage Adequacy (2026)
Florian Tambon et al.
4.33
EvoEye: Self-Evolving Runtime Monitoring for Autonomous Driving Systems (2026)
Mingfei Cheng et al.
4.33
Correct but Slow: An Empirical Study of the GPU Kernel Evaluation Gap in Modern Domain-Specific Languages (2026)
Tingxi Li et al.
4.33
Detecting Vulnerability-Inducing Commits via Multi-Stage Reasoning with LLM-Based Agents (2026)
Liyou Chen et al.
4.33
Beyond Refusal: A Same-Lineage Study of Aligned and Abliterated LLMs for Vulnerability Analysis (2026)
Mingchen Li et al.
4.33
Mitigating Errors in LLM-Generated Web API Invocations via Retrieval-Augmented Generation and Constrained Decoding (2026)
Daniel Maninger et al.
4.33
Collaborative Multi-Agent Testing for Emergent Failure Discovery in Autonomous Driving Systems (2026)
Ruizhen Gu et al.
4.33
AgentCheck: A Reproduce-Intervene-Mitigate Workbench for LLM Agents over MCP (2026)
Aritra Mazumder et al.
4.33