Time-scale Separation In Q-learning: Extending Td(\(\triangle\)) For Action-value Function Decomposition
2024 Β· Mahammad Humayoo
Abstract
Q-Learning is a fundamental off-policy reinforcement learning (RL) algorithm that has the objective of approximating action-value functions in order to learn optimal policies. Nonetheless, it has difficulties in reconciling bias with variance, particularly in the context of long-term rewards. This paper introduces Q(\(\Delta\))-Learning, an extension of TD(\(\Delta\)) for the Q-Learning framework. TD(\(\Delta\)) facilitates efficient learning over several time scales by breaking the Q(\(\Delta\))-function into distinct discount factors. This approach offers improved learning stability and scalability, especially for long-term tasks where discounting bias may impede convergence. Our methodology guarantees that each element of the Q(\(\Delta\))-function is acquired individually, facilitating expedited convergence on shorter time scales and enhancing the learning of extended time scales. We demonstrate through theoretical analysis and practical evaluations on standard benchmarks like Atar
Authors
(none)
Tags
Stats
Related papers
- Segmenting Action-value Functions Over Time-scales In SARSA Via Td(\(\delta\)) (2024)0.00
- Simplifying Deep Temporal Difference Learning (2024)0.00
- Multi-step Reinforcement Learning: A Unifying Algorithm (2017)12.68
- Time-aware Q-networks: Resolving Temporal Irregularity For Deep Reinforcement Learning (2021)0.00
- Assumed Density Filtering Q-learning (2017)5.24
- Discerning Temporal Difference Learning (2023)0.00
- Towards A Better Understanding Of Representation Dynamics Under Td-learning (2023)0.00
- An Analysis Of Quantile Temporal-difference Learning (2023)0.00