Awesome Bug Detection
Bug Detection is one of the most active areas in Awesome AI for Code β 2,160 papers in this collection, evaluated on datasets like Defects4J, HumanEval, SWE-bench. A strong starting point is "How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks?".
Datasets & benchmarks
Key papers
- How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks? (2023)Ying Zhang et al.8.58
- LLMs in Software Security: A Survey of Vulnerability Detection
Techniques and Insights (2025)Ze Sheng et al.7.50
- To Err is Machine: Vulnerability Detection Challenges LLM Reasoning (2024)Benjamin Steenhoek et al.7.16
- IRIS: LLM-Assisted Static Analysis for Detecting Security
Vulnerabilities (2024)Ziyang Li et al.7.00
- PlotGen: Multi-Agent LLM-based Scientific Data Visualization via
Multimodal Feedback (2025)Kanika Goswami et al.6.64
- Agora: Toward Autonomous Bug Detection in Production-Level Consensus Protocols with LLM Agents (2026)Xiang Liu et al.6.46
- Security Weaknesses of Copilot-Generated Code in GitHub Projects: An
Empirical Study (2023)Yujia Fu et al.6.13
- Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug
Unearthing (2024)Asmita et al.6.02
- PythonPal: Enhancing Online Programming Education through Chatbot-Driven
Personalized Feedback (2025)Sirinda Palahan5.96
- Benchmarking Prompt Engineering Techniques for Secure Code Generation
with GPT Models (2025)Marc Bruni et al.5.90
- SVA-ICL: Improving LLM-based Software Vulnerability Assessment via In-Context Learning and Information Fusion (2025)Chaoyang Gao et al.5.76
- Fully Autonomous Programming using Iterative Multi-Agent Debugging with
Large Language Models (2025)Anastasiia Grishina and Vadim Liventsev and Aki H\"arm\"a and Leon Moonen5.65
- SecureFalcon: Are We There Yet in Automated Software Vulnerability
Detection with LLMs? (2023)Mohamed Amine Ferrag et al.5.58
- From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security (2024)Enna Basic et al.5.52
- CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging (2025)Md. Ashraful Islam et al.5.46
- TOGLL: Correct and Strong Test Oracle Generation with LLMs (2024)Soneya Binta Hossain and Matthew Dwyer5.40
- Quality In, Quality Out: Investigating Training Data's Role in AI Code
Generation (2025)Cristina Improta et al.5.29
- What's Wrong with Your Code Generated by Large Language Models? An Extensive Study (2024)Shihan Dou et al.5.20
- KernelGPT: Enhanced Kernel Fuzzing via Large Language Models (2024)Chenyuan Yang et al.5.18
- BitsAI-CR: Automated Code Review via LLM in Practice (2025)Tao Sun et al.5.18
- Alibaba LingmaAgent: Improving Automated Issue Resolution via
Comprehensive Repository Exploration (2024)Yingwei Ma and Qingping Yang and Rongyu Cao and Binhua Li and Fei Huang and Yongbin Li5.15
- Towards Translating Real-World Code with LLMs: A Study of Translating to
Rust (2024)Hasan Ferit Eniser et al.5.09
- AI-powered Code Review with LLMs: Early Results (2024)Zeeshan Rasheed et al.5.04
- Willing but Unable: Separating Refusal from Capability in Code LLMs via Abliteration (2026)Cristina Carleo et al.5.01
- Securing Code Understanding: Detecting Natural Backdoor Vulnerability in Code Language Models (2026)Yuchen Chen et al.5.01
- Context-Based Adversarial Attacks on AI Code Generators: Vulnerability Analysis and Implications (2026)Walther A. Del Orbe et al.5.01
- Minimal Prompt Perturbations Lead to Code Vulnerabilities: Prompt Fragility and Hidden-State Signals in Coding LLMs (2026)Alexander Sternfeld et al.4.95
- On Benchmarking Code LLMs for Android Malware Analysis (2025)Yiling He et al.4.93
- Validating Network Protocol Parsers with Traceable RFC Document
Interpretation (2025)Mingwei Zheng et al.4.93
- Enhancing Software Vulnerability Detection Using Code Property Graphs
and Convolutional Neural Networks (2025)Amanpreet Singh Saimbhi4.87
- Combining Language and App UI Analysis for the Automated Assessment of
Bug Reproduction Steps (2025)Junayed Mahmud et al.4.82
- LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large
Language Models (2025)Mohamad Fakih et al.4.76
- Test Wars: A Comparative Study of SBST, Symbolic Execution, and
LLM-Based Approaches to Unit Test Generation (2025)Azat Abdullin et al.4.76
- Code Change Intention, Development Artifact and History Vulnerability:
Putting Them Together for Vulnerability Fix Detection by LLM (2025)Xu Yang et al.4.76
- Are Large Language Models Memorizing Bug Benchmarks? (2024)Daniel Ramos et al.4.65
- ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts (2024)Lyuye Zhang and Kaixuan Li and Kairan Sun and Daoyuan Wu and Ye Liu and Haoye Tian and Yang Liu4.63
- Learning Software Bug Reports: A Systematic Literature Review (2025)Guoming Long et al.4.58
- On the Challenges of Fuzzing Techniques via Large Language Models (2024)Linghan Huang et al.4.57
- A Preliminary Study of Large Language Models for Multilingual Vulnerability Detection (2025)Junji Yu et al.4.47
- LLM4SZZ: Enhancing SZZ Algorithm with Context-Enhanced Assessment on
Large Language Models (2025)Lingxiao Tang et al.4.42
- Benchmarking LLM for Code Smells Detection: OpenAI GPT-4.0 vs
DeepSeek-V3 (2025)Ahmed R. Sadik et al.4.42
- VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding
and Guided Correction (2025)Ning Wang et al.4.42
- SPOQ: Specialist Orchestrated Queuing for Multi-Agent Software Engineering (2026)Royce Carbowitz et al.4.39
- Decoupled Smart Contract Audits: Lightweight LLM Framework via Distillation and Aggregation (2026)Bagus Rakadyanto Oktavianto Putra et al.4.39
- Learn from Your Mistakes: Tree-like Self-Play for Secure Code LLMs (2026)Wenqi Chen et al.4.39
- FLARE: Fine-Grained Diagnostic Feedback for LLM Code Refinement (2026)Yinsheng Yao et al.4.39
- Multi-task LLMs for Bug Classification: Efficient Inference with Auxiliary Decoding Heads (2026)Nikolai Rozanov4.39
- Data-aware Static Analysis: Improving Detection of Semantic Faults in Machine Learning Code Using Data Characteristics (2026)Willem Meijer et al.4.39
- SEC-bench Pro: Can Language Models Solve Long-Horizon Software Security Tasks? (2026)Hwiwon Lee et al.4.33
- ConVer: Using Contracts and Loop Invariant Synthesis for Scalable Formal Software Verification (2026)Muhammad A. A. Pirzada et al.4.33
- ProDebug: An Automated Debugging System for Prolog (2026)Ricardo Brancas et al.4.33
- AssertLLM2: A Comprehensive LLM Benchmark for Assertion Generation from Design Specifications (2026)Yuchao Wu et al.4.33
- Poison with Style: A Practical Poisoning Attack on Code Large Language Models (2026)Khang Tran et al.4.33
- Multi-Agent LLM-based Metamorphic Testing for REST APIs (2026)Shehroz Khan et al.4.33
- Learning the Error Patterns of Language Models (2026)Jinwoo Kim et al.4.33
- Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification (2026)Dylan Bouchard et al.4.33
- Towards Demystifying and Repairing LLM-in-the-Loop Vulnerabilities (2026)Yujie Ma et al.4.33
- Usability Analysis of Configurator User Interfaces with Multimodal Large Language Models (2026)Sebastian Lubos et al.4.33
- Inferring Code Correctness from Specification (2026)Tambon Florian et al.4.33
- Projectional Decoding: Towards Semantic-Aware LLM Generation (2026)Boqi Chen et al.4.33