Loss- And Reward-weighting For Efficient Distributed Reinforcement Learning
2023 · Martin Holen, Per-Arne Andersen, Kristian Muri Knausgård, et al.
Abstract
This paper introduces two learning schemes for distributed agents in Reinforcement Learning (RL) environments, namely Reward-Weighted (R-Weighted) and Loss-Weighted (L-Weighted) gradient merger. The R/L weighted methods replace standard practices for training multiple agents, such as summing or averaging the gradients. The core of our methods is to scale the gradient of each actor based on how high the reward (for R-Weighted) or the loss (for L-Weighted) is compared to the other actors. During training, each agent operates in differently initialized versions of the same environment, which gives different gradients from different actors. In essence, the R-Weights and L-Weights of each agent inform the other agents of its potential, which again reports which environment should be prioritized for learning. This approach of distributed learning is possible because environments that yield higher rewards, or low losses, have more critical information than environments that yield lower reward
Authors
(none)
Tags
Stats
Related papers
- Communication-efficient Policy Gradient Methods For Distributed Reinforcement Learning (2018)13.05
- Weighted Double Deep Multiagent Reinforcement Learning In Stochastic Cooperative Environments (2018)0.00
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00
- Distributional Reward Estimation For Effective Multi-agent Deep Reinforcement Learning (2022)0.00
- Distributed Policy Gradient With Variance Reduction In Multi-agent Reinforcement Learning (2021)0.00
- The Gradient Convergence Bound Of Federated Multi-agent Reinforcement Learning With Efficient Communication (2021)0.00
- Off-policy Reinforcement Learning With Loss Function Weighted By Temporal Difference Error (2022)2.26
- Noise Distribution Decomposition Based Multi-agent Distributional Reinforcement Learning (2023)0.00