Abstract
Recent progress in large language models (LLMs) has significantly improved automated code generation; however, most existing systems operate without execution awareness, often producing syntactically correct but semantically invalid or non-executable programs. The absence of runtime validation and structured debugging limits their reliability in practical software development environments. This paper presents an execution-guided multi-agent autonomous framework designed to enhance the robustness of AI-driven code synthesis. The proposed architecture incorporates specialized agents for task decomposition, implementation, and validation, coordinated through a centralized orchestration layer. Generated code is executed within a secure containerized sandbox, enabling controlled runtime analysis and structured feedback extraction. Execution traces, error logs, and exception data are utilized to drive an iterative self-refinement mechanism, allowing the system to autonomously detect and correct faults. The framework supports modular extensibility and domain-aware prompt conditioning to accommodate frontend, backend, full-stack, and low-level programming tasks. Experimental evaluation demonstrates improved execution success rates and reduced manual debugging effort compared to static generation approaches. The proposed method advances execution-aware AI systems toward reliable and self-healing software engineering automation.