Hindsight Experience Replay With Kronecker Product Approximate Curvature
2020 Β· Dhuruva Priyan G M, Abhik Singla, Shalabh Bhatnagar
Abstract
Hindsight Experience Replay (HER) is one of the efficient algorithm to solve Reinforcement Learning tasks related to sparse rewarded environments.But due to its reduced sample efficiency and slower convergence HER fails to perform effectively. Natural gradients solves these challenges by converging the model parameters better. It avoids taking bad actions that collapse the training performance. However updating parameters in neural networks requires expensive computation and thus increase in training time. Our proposed method solves the above mentioned challenges with better sample efficiency and faster convergence with increased success rate. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. We solve this issue by including Twin Delayed Deep Deterministic Policy Gradients(TD3) in HER. TD3 learns two Q-functions instead of one and it adds
Authors
(none)
Tags
Stats
Related papers
- Bias-reduced Hindsight Experience Replay With Virtual Goal Prioritization (2019)9.41
- Hindsight Policy Gradients (2017)0.00
- Hindsight Trust Region Policy Optimization (2019)0.00
- Adaptable Hindsight Experience Replay For Search-based Learning (2025)0.00
- Remember And Forget For Experience Replay (2018)0.00
- Higher : Improving Instruction Following With Hindsight Generation For Experience Replay (2019)6.34
- Sample Efficiency In Sparse Reinforcement Learning: Or Your Money Back (2020)0.00
- Introspective Experience Replay: Look Back When Surprised (2022)0.00