Agentpulse: A Continuous Multi-signal Framework For Evaluating AI Agents In Deployment

·2026

Abstract

Static benchmarks measure what AI agents can do at a fixed point in time but not how they are adopted, maintained, or experienced in deployment. We introduce AgentPulse, a continuous evaluation framework scoring 50 agents across 10 workload categories along four factors (Benchmark Performance, Adoption Signals, Community Sentiment, and Ecosystem Health) aggregated from 18 real-time signals across GitHub, package registries, IDE marketplaces, social platforms, and benchmark leaderboards. Three analyses ground the framework. The four factors capture largely complementary information (n=50; $\rho_\{\max\}=0.61$ for Adoption-Ecosystem, all others $|\rho| \leq 0.37$ ). A circularity-controlled test (n=35) shows the Benchmark+Sentiment sub-composite, which contains no GitHub-derived signals, predicts external adoption proxies it does not aggregate: GitHub stars ( $\rho_s=0.52$ , $p<0.01$ ) and Stack Overflow question volume (\(\rho_s=0.49\

Abstract

Related papers