Tompo: Training LLM Strategic Decision Making From A Multi-agent Perspective
2025 Β· Yiwen Zhang, Ziang Chen, Fanqi Kong, et al.
Abstract
Large Language Models (LLMs) have been used to make decisions in complex scenarios, where they need models to think deeply, reason logically, and decide wisely. Many existing studies focus solely on multi-round conversations in social tasks or simulated environments, neglecting the various types of decisions and their interdependence. Current reinforcement learning methods struggle to consider the strategies of others during training. To address these issues, we first define a strategic decision-making problem that includes two types of decisions and their temporal dependencies. Furthermore, we propose **T**heory **o**f **M**ind **P**olicy **O**ptimization **(ToMPO)** algorithm to optimize the perception of other individual strategies and the game situation trends. Compared to the Group Relative Policy Optimization (GRPO) algorithm, ToMPO enhances the LLM's strategic decision-making mainly by: 1) generating rollouts based on reasoning the strategies of other individuals, 2) estimating
Authors
(none)
Tags
Stats
Related papers
- End-to-end Optimization Of Llm-driven Multi-agent Search Systems Via Heterogeneous-group-based Reinforcement Learning (2025)0.00
- MARSHAL: Incentivizing Multi-agent Reasoning Via Self-play With Strategic Llms (2025)0.00
- YOLO-MARL: You Only LLM Once For Multi-agent Reinforcement Learning (2024)0.00
- Agent-pro: Learning To Evolve Via Policy-level Reflection And Optimization (2024)9.59
- DLM: Unified Decision Language Models For Offline Multi-agent Sequential Decision Making (2026)0.00
- True Knowledge Comes From Practice: Aligning Llms With Embodied Environments Via Reinforcement Learning (2024)0.00
- Language-driven Coordination And Learning In Multi-agent Simulation Environments (2025)0.00
- Language Agents With Reinforcement Learning For Strategic Play In The Werewolf Game (2023)0.00