Spacer: Self-play Anchoring With Centralized Reference Models
2025 Β· Wei-Jer Chang, Akshay Rangesh, Kevin Joseph, et al.
Abstract
Developing autonomous vehicles (AVs) requires not only safety and efficiency, but also realistic, human-like behaviors that are socially aware and predictable. Achieving this requires sim agent policies that are human-like, fast, and scalable in multi-agent settings. Recent progress in imitation learning with large diffusion-based or tokenized models has shown that behaviors can be captured directly from human driving data, producing realistic policies. However, these models are computationally expensive, slow during inference, and struggle to adapt in reactive, closed-loop scenarios. In contrast, self-play reinforcement learning (RL) scales efficiently and naturally captures multi-agent interactions, but it often relies on heuristics and reward shaping, and the resulting policies can diverge from human norms. We propose SPACeR, a framework that leverages a pretrained tokenized autoregressive motion model as a centralized reference policy to guide decentralized self-play. The reference
Authors
(none)
Tags
Stats
Related papers
- Role Play: Learning Adaptive Role-specific Strategies In Multi-agent Interactions (2024)0.00
- Learning To Simulate Self-driven Particles System With Coordinated Policy Optimization (2021)0.00
- From Centralized To Self-supervised: Pursuing Realistic Multi-agent Reinforcement Learning (2023)0.00
- Learning Complex Spatial Behaviours In ABM: An Experimental Observational Study (2022)0.00
- Learn The Ropes, Then Trust The Wins: Self-imitation With Progressive Exploration For Agentic Reinforcement Learning (2025)0.00
- Robustifying A Policy In Multi-agent RL With Diverse Cooperative Behaviors And Adversarial Style Sampling For Assistive Tasks (2024)0.00
- Spark: Strategic Policy-aware Exploration Via Dynamic Branching For Long-horizon Agentic Learning (2026)0.00
- Model-based Reinforcement Learning For Atari (2019)0.00