← all papers · overview

Scaling Self-Evolving Agents via Parametric Memory

Abstract

Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they have seen but cannot \emph{learn from} it: their policy is unchanged by experience, and any information dropped from the context is permanently lost. We introduce \texttt{TMEM}, a self-evolving parametric memory framework in which the agent not only compresses history into explicit memory but also absorbs distilled supervision into fast LoRA weights Δt\Delta_t via lightweight online updates, genuinely altering its future behavior within a single episode. We formalize this as an agentic decision process with fast-weight rollout dynamics: actions are sampled from πθ0+Δt\pi_{\theta_0+\Delta_t}, while extraction actions produce supervision that updates Δt\Delta_t for subsequent decisions. This view makes the extraction policy directly optimizable by RL: training θ0\theta_0 improves not only task actions but also the quality of the data used for online LoRA adaptation. We further propose SVD-based initialization of the LoRA subspace to accelerate online convergence. Experiments on LoCoMo, LongMemEval-S, multi-objective search, and CL-Bench show that \texttt{TMEM} consistently outperforms summary-based and retrieval-based baselines across different model scales.

Related papers

Ranked by semantic similarity — how closely each paper's abstract matches this one (100% = near-identical topic).