← all papers · overview

Towards Understanding, Analyzing, And Optimizing Agentic AI Execution: A Cpu-centric Perspective

·2025

Abstract

Agentic AI serving converts monolithic LLM-based inference to autonomous problem-solvers that can plan, call tools, perform reasoning, and adapt on the fly. Due to diverse task execution need, such serving heavily rely on heterogeneous CPU-GPU systems with majority of the external tools responsible for agentic capability, either run on or are orchestrated by the CPU. Towards having a deeper understanding of its role, this paper aims to characterize and analyze the system bottlenecks introduced by agentic AI workloads from a largely overlooked CPU-centric perspective. We first present a compile-time characterization of agentic AI execution and choose representative workloads to capture the algorithmic diversity. We then perform runtime characterization of the representative workloads analyzing the end-to-end latency and throughput on two different hardware systems to isolate respective architectural bottlenecks. Based on the insights on the bottlenecks, we finally present two scheduli

Related papers

Ranked by semantic similarity — how closely each paper's abstract matches this one (100% = near-identical topic).