A Unified Approach For Multi-step Temporal-difference Learning With Eligibility Traces In Reinforcement Learning
2018 Β· Long Yang, Minhao Shi, Qian Zheng, et al.
Abstract
Recently, a new multi-step temporal learning algorithm, called \(Q(\sigma)\), unifies \(n\)-step Tree-Backup (when \(\sigma=0\)) and \(n\)-step Sarsa (when \(\sigma=1\)) by introducing a sampling parameter \(\sigma\). However, similar to other multi-step temporal-difference learning algorithms, \(Q(\sigma)\) needs much memory consumption and computation time. Eligibility trace is an important mechanism to transform the off-line updates into efficient on-line ones which consume less memory and computation time. In this paper, we further develop the original \(Q(\sigma)\), combine it with eligibility traces and propose a new algorithm, called \(Q(\sigma ,\lambda)\), in which \(\lambda\) is trace-decay parameter. This idea unifies Sarsa\((\lambda)\) (when \(\sigma =1\)) and \(Q^\{\pi\}(\lambda)\) (when \(\sigma =0\)). Furthermore, we give an upper error bound of \(Q(\sigma ,\lambda)\) policy evaluation algorithm. We prove that \(Q(\sigma,\lambda)\) control algorithm can converge to the op
Authors
(none)
Tags
Stats
Related papers
- Double Q(\(\sigma\)) And Q(\(\sigma, \lambda\)): Unifying Reinforcement Learning Control Algorithms (2017)0.00
- Multi-step Reinforcement Learning: A Unifying Algorithm (2017)12.68
- Tbq(\(\sigma\)): Improving Efficiency Of Trace Utilization For Off-policy Reinforcement Learning (2019)0.00
- Meta-learning State-based Eligibility Traces For More Sample-efficient Policy Evaluation (2019)0.00
- Understanding Multi-step Deep Reinforcement Learning: A Systematic Study Of The DQN Target (2019)0.00
- Meta-learning Eligibility Traces For More Sample Efficient Temporal Difference Learning (2020)0.00
- Adaptive Tree Backup Algorithms For Temporal-difference Reinforcement Learning (2022)0.00
- Trajectory-aware Eligibility Traces For Off-policy Reinforcement Learning (2023)0.00