Agenther: Hindsight Experience Replay For LLM Agent Trajectory Relabeling
2026 Β· Liang Ding
Abstract
arXiv:2603.21357v3 Announce Type: replace Abstract: LLM agents fail on the majority of real-world tasks -- GPT-4o succeeds on fewer than 15% of WebArena navigation tasks and below 55% pass@1 on ToolBench (Zhou et al., 2024; Qin et al., 2024) -- yet every failed trajectory is routinely discarded, wasting the dominant source of collected experience. We introduce AgentHER, a framework that recovers this lost training signal by adapting the Hindsight Experience Replay (HER; Andrychowicz et al., 2017) principle to natural-language agent trajectories for offline data augmentation. The key insight is simple: a trajectory that fails goal A is often a correct demonstration for some achievable alternative goal B. AgentHER realises this idea through a four-stage pipeline -- failure classification, outcome extraction, LLM-guided prompt relabeling with confidence gating, and data packaging -- that converts discarded failures into high-quality SFT, DPO, and ShareGPT training data, with both zero-co
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Online Learning In LM Agents Via Hindsight Trajectory Rewriting (2025)1.57
- Higher : Improving Instruction Following With Hindsight Generation For Experience Replay (2019)6.34
- Bias-reduced Hindsight Experience Replay With Virtual Goal Prioritization (2019)9.41
- SAC-GLAM: Improving Online RL For LLM Agents With Soft Actor-critic And Hindsight Relabeling (2024)0.00
- Adaptable Hindsight Experience Replay For Search-based Learning (2025)0.00
- Klong: Training LLM Agent For Extremely Long-horizon Tasks (2026)0.00
- Generalized Decision Transformer For Offline Hindsight Information Matching (2021)0.00
- Himac: Hierarchical Macro-micro Learning For Long-horizon LLM Agents (2026)0.00