← all papers · overview

Agentpulse: A Continuous Multi-signal Framework For Evaluating AI Agents In Deployment

·2026

Abstract

Static benchmarks measure what AI agents can do at a fixed point in time but not how they are adopted, maintained, or experienced in deployment. We introduce AgentPulse, a continuous evaluation framework scoring 50 agents across 10 workload categories along four factors (Benchmark Performance, Adoption Signals, Community Sentiment, and Ecosystem Health) aggregated from 18 real-time signals across GitHub, package registries, IDE marketplaces, social platforms, and benchmark leaderboards. Three analyses ground the framework. The four factors capture largely complementary information (n=50; ρ{max}=0.61\rho_\{\max\}=0.61 for Adoption-Ecosystem, all others ρ0.37|\rho| \leq 0.37). A circularity-controlled test (n=35) shows the Benchmark+Sentiment sub-composite, which contains no GitHub-derived signals, predicts external adoption proxies it does not aggregate: GitHub stars (ρs=0.52\rho_s=0.52, p<0.01p<0.01) and Stack Overflow question volume (\(\rho_s=0.49\

Related papers

Ranked by semantic similarity — how closely each paper's abstract matches this one (100% = near-identical topic).