Design and Empirical Evaluation of a Four-Layer AI Agent Architecture for Automated Web Application Security Testing

B. Kulambayev·G. Astaubayeva·Zhanna Mukanova·Kuralay Makhmetova·S. Mambetov·S. Joldasbayev·2026

Abstract

This study proposes a four-layer AI agent architecture for automating routine web security operations, integrating Large Language Model (LLM) reasoning with a hybrid Convolutional Neural Network-Long Short-Term Memory (CNN-LSTM) detection engine and implementing a Reasoning-Acting (ReAct) loop for autonomous testing with human-in-the-loop validation. The proposed architecture was empirically evaluated across 50 web applications sourced from OWASP WebGoat, DVWA, and custom-developed test environments over a six-month period. The experimental results demonstrate that the AI agent achieved an overall detection accuracy of 89.2% (95% CI: 86.4-92.0%), significantly outperforming traditional automated methods (67.4% accuracy, p < 0.001). Mean Time to Remediation (MTTR) decreased from 74.3 days to 28.5 days (61.6% reduction), while false positive rates decreased from 24.3% to 4.8%. According to these findings, AI agent-driven automation can substantially enhance the efficiency and reliability of web security testing. However, human expertise remains important for assessing complex vulnerabilities and detecting zero-day threats.

Abstract

Related papers