Personalized Multi-agent Average Reward Td-learning Via Joint Linear Approximation
2026 Β· Leo Muxing Wang, Pengkun Yang, Lili Su
Abstract
We study personalized multi-agent average reward TD learning, in which a collection of agents interacts with different environments and jointly learns their respective value functions. We focus on the setting where there exists a shared linear representation, and the agents' optimal weights collectively lie in an unknown linear subspace. Inspired by the recent success of personalized federated learning (PFL), we study the convergence of cooperative single-timescale TD learning in which agents iteratively estimate the common subspace and local heads. We showed that this decomposition can filter out conflicting signals, effectively mitigating the negative impacts of ``misaligned'' signals, and achieving linear speedup. The main technical challenges lie in the heterogeneity, the Markovian sampling, and their intricate interplay in shaping error evolutions. Specifically, not only are the error dynamics of multiple variables closely interconnected, but there is also no direct contraction fo
Authors
(none)
Tags
Stats
Related papers
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Finite-time Performance Of Distributed Temporal Difference Learning With Linear Function Approximation (2019)9.59
- On The Linear Speedup Of Personalized Federated Reinforcement Learning With Shared Representations (2024)0.00
- Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis (2025)0.00
- Multi-agent Off-policy TD Learning: Finite-time Analysis With Near-optimal Sample Complexity And Communication Complexity (2021)0.00
- Local Stochastic Approximation: A Unified View Of Federated Learning And Distributed Multi-task Reinforcement Learning Algorithms (2020)0.00
- Multi-agent Reinforcement Learning Via Double Averaging Primal-dual Optimization (2018)0.00
- Fast Multi-agent Temporal-difference Learning Via Homotopy Stochastic Primal-dual Optimization (2019)0.00