Segmenting Action-value Functions Over Time-scales In SARSA Via Td(\(\delta\))
2024 Β· Mahammad Humayoo
Abstract
In numerous episodic reinforcement learning (RL) environments, SARSA-based methodologies are employed to enhance policies aimed at maximizing returns over long horizons. Traditional SARSA algorithms face challenges in achieving an optimal balance between bias and variation, primarily due to their dependence on a single, constant discount factor (\(\eta\)). This investigation enhances the temporal difference decomposition method, TD(\(\Delta\)), by applying it to the SARSA algorithm, now designated as SARSA(\(\Delta\)). SARSA is a widely used on-policy RL method that enhances action-value functions via temporal difference updates. By splitting the action-value function down into components that are linked to specific discount factors, SARSA(\(\Delta\)) makes learning easier across a range of time scales. This analysis makes learning more effective and ensures consistency, particularly in situations where long-horizon improvement is needed. The results of this research show that the sugg
Authors
(none)
Tags
Stats
Related papers
- Time-scale Separation In Q-learning: Extending Td(\(\triangle\)) For Action-value Function Decomposition (2024)0.00
- Discerning Temporal Difference Learning (2023)0.00
- An Analysis Of Action-value Temporal-difference Methods That Learn State Values (2025)0.00
- Learning Sparse Representations In Reinforcement Learning (2019)0.00
- Time Discretization-invariant Safe Action Repetition For Policy Gradient Methods (2021)0.00
- Sample Complexity Bounds For Two Timescale Value-based Reinforcement Learning Algorithms (2020)0.00
- Prediction And Control In Continual Reinforcement Learning (2023)0.00
- Finite-sample Analysis For SARSA With Linear Function Approximation (2019)0.00