Halypo: Heterogeneous-agent Lyapunov Policy Optimization For Human-robot Collaboration
2026 Β· Hao Zhang, Yaru Niu, Yikai Wang, et al.
Abstract
To improve generalization and resilience in human-robot collaboration (HRC), robots must handle the combinatorial diversity of human behaviors and contexts, motivating multi-agent reinforcement learning (MARL). However, inherent heterogeneity between robots and humans creates a rationality gap (RG) in the learning process-a variational mismatch between decentralized best-response dynamics and centralized cooperative ascent. The resulting learning problem is a general-sum differentiable game, so independent policy-gradient updates can oscillate or diverge without added structure. We propose heterogeneous-agent Lyapunov policy optimization (HALyPO), which establishes formal stability directly in the policy-parameter space by enforcing a per-step Lyapunov decrease condition on a parameter-space disagreement metric. Unlike Lyapunov-based safe RL, which targets state/trajectory constraints in constrained Markov decision processes, HALyPO uses Lyapunov certification to stabilize decentralize
Authors
(none)
Tags
Stats
Related papers
- Heterogeneous Multi-robot Reinforcement Learning (2023)6.77
- Robust And Diverse Multi-agent Learning Via Rational Policy Gradient (2025)0.00
- Hypermarl: Adaptive Hypernetworks For Multi-agent RL (2024)0.00
- Co2po: Coordinated Constrained Policy Optimization For Multi-agent RL (2026)0.00
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Heterogeneous Multi-agent Reinforcement Learning For Zero-shot Scalable Collaboration (2024)6.34
- Heterogeneous-agent Reinforcement Learning (2023)0.00
- Heterogeneous Multi-agent Reinforcement Learning Via Mirror Descent Policy Optimization (2023)0.00