Klong: Training LLM Agent For Extremely Long-horizon Tasks
2026 Β· Yue Liu, Yingwei Ma, Yibo Miao, et al.
Abstract
arXiv:2602.17547v3 Announce Type: replace Abstract: This paper introduces KLong, an open-source LLM agent trained to solve extremely long-horizon tasks. The principle is to first cold-start the model via trajectory-splitting SFT, then scale it via progressive RL training. Specifically, we first activate basic agentic abilities of a base model with a comprehensive SFT recipe. Then, we introduce Research-Factory, an automated pipeline that generates high-quality training data by collecting research papers and constructing evaluation rubrics. Using this pipeline, we build thousands of long-horizon trajectories distilled from Claude 4.5 Sonnet (Thinking). To train with these extremely long trajectories, we propose a new trajectory-splitting SFT, which preserves early context, progressively truncates later context, and maintains overlap between sub-trajectories. In addition, to further improve long-horizon task-solving capability, we propose a novel progressive RL, which schedules training
Authors
(none)
Tags
Stats
Related papers
- Himac: Hierarchical Macro-micro Learning For Long-horizon LLM Agents (2026)0.00
- Training Agents With Weakly Supervised Feedback From Large Language Models (2024)0.00
- Agenther: Hindsight Experience Replay For LLM Agent Trajectory Relabeling (2026)0.00
- Agent Lightning: Train ANY AI Agents With Reinforcement Learning (2025)0.00
- DLM: Unified Decision Language Models For Offline Multi-agent Sequential Decision Making (2026)0.00
- Towards Agentic Self-learning Llms In Search Environment (2025)0.00
- SAC-GLAM: Improving Online RL For LLM Agents With Soft Actor-critic And Hindsight Relabeling (2024)0.00
- Talktoagent: A Human-centric Explanation Of Reinforcement Learning Agents With Large Language Models (2025)0.00