Biologically Plausible Variational Policy Gradient With Spiking Recurrent Winner-take-all Networks
2022 Β· Zhile Yang, Shangqi Guo, Ying Fang, et al.
Abstract
One stream of reinforcement learning research is exploring biologically plausible models and algorithms to simulate biological intelligence and fit neuromorphic hardware. Among them, reward-modulated spike-timing-dependent plasticity (R-STDP) is a recent branch with good potential in energy efficiency. However, current R-STDP methods rely on heuristic designs of local learning rules, thus requiring task-specific expert knowledge. In this paper, we consider a spiking recurrent winner-take-all network, and propose a new R-STDP method, spiking variational policy gradient (SVPG), whose local learning rules are derived from the global policy gradient and thus eliminate the need for heuristic designs. In experiments of MNIST classification and Gym InvertedPendulum, our SVPG achieves good training performance, and also presents better robustness to various kinds of noises than conventional methods.
Authors
(none)
Tags
Stats
Related papers
- Learning First-to-spike Policies For Neuromorphic Control Using Policy Gradients (2018)8.60
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00
- Deep Reinforcement Learning With Spiking Q-learning (2022)0.00
- Evolving-to-learn Reinforcement Learning Tasks With Spiking Neural Networks (2022)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Variational Policy Gradient Method For Reinforcement Learning With General Utilities (2020)0.00
- Reinforcement Learning With A Network Of Spiking Agents (2019)0.00
- On The Model-based Stochastic Value Gradient For Continuous Reinforcement Learning (2020)0.00