Sparsity Is Necessary: Polynomial-time Stability For Agentic Llms In Large Action Spaces
2026 Β· Angshul Majumdar
Abstract
Tool-augmented LLM systems expose a control regime that learning theory has largely ignored: sequential decision-making with a massive discrete action universe (tools, APIs, documents) in which only a small, unknown subset is relevant for any fixed task distribution. We formalize this setting as Sparse Agentic Control (SAC), where policies admit block-sparse representations over M >> 1 actions and rewards depend on sparse main effects and (optionally) sparse synergies. We study ell_\{1,2\}-regularized policy learning through a convex surrogate and establish sharp, compressed-sensing-style results: (i) estimation and value suboptimality scale as k (log M / T)^\{1/2\} under a Policy-RSC condition; (ii) exact tool-support recovery holds via primal-dual witness arguments when T > k log M under incoherence and beta-min; and (iii) any dense policy class requires Omega(M) samples, explaining the instability of prompt-only controllers. We further show that under partial observability, LLMs mat
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning With Sparse-executing Actions Via Sparsity Regularization (2021)0.00
- Online Sparse Reinforcement Learning (2020)0.00
- Bayesian Off-policy Evaluation And Learning For Large Action Spaces (2024)0.00
- SAC-GLAM: Improving Online RL For LLM Agents With Soft Actor-critic And Hindsight Relabeling (2024)0.00
- Himac: Hierarchical Macro-micro Learning For Long-horizon LLM Agents (2026)0.00
- RL In Latent Mdps Is Tractable: Online Guarantees Via Off-policy Evaluation (2024)0.00
- SALSA-RL: Stability Analysis In The Latent Space Of Actions For Reinforcement Learning (2025)0.00
- Learning In Complex Action Spaces Without Policy Gradients (2024)0.00