TDPGen: Optimizing Agentic Code Generation via Test-Driven Planning and Hierarchical ReAct

Abstract

Driven by the robust reasoning and text generation capabilities of Large Language Models (LLMs), multi-agent approaches for code generation have garnered increasing research attention. However, existing frameworks often suffer from a lack of effective planning guidance and robust debugging mechanisms, frequently causing agents to fall into ineffective trial-and-error loops that ultimately yield erroneous code. To overcome these limitations, we introduce TDPGen, an innovative multi-agent architecture that synergistically integrates Test-Driven Planning with Hierarchical ReAct strategies. Distinguishing itself from conventional approaches, our framework employs a Test Agent to generate a priori test cases, transforming implicit task constraints into explicit planning directives. This mechanism guides the Planning Agent to preemptively mitigate potential risks during the plan formulation phase. Furthermore, we design a hierarchical ReAct mechanism wherein the Coding Agent can escalate persistent errors-those remaining unresolved after exhausting the debugging budget-to the planning layer for high-level revision and refactoring, effectively breaking code debugging deadlocks. Our empirical results on five major datasets (HumanEval, HumanEval-ET, MBPP, MBPP-ET, and APPS) indicate that TDPGen achieves superior performance compared to established baselines. Notably, utilizing the compact Qwen2.5-Coder-7B model, our framework achieves substantial improvements in Pass@1 accuracy of 10.4% on HumanEval and 16.2 % on MBPP. These findings confirm the value of combining pre-emptive test constraints with hierarchical debugging strategies, offering a robust solution for reliable agent-based code synthesis.

Abstract

Related papers