Simultaneous Training Of First- And Second-order Optimizers In Population-based Reinforcement Learning
2024 Β· Felix Pfeiffer, Shahram Eivazi
Abstract
The tuning of hyperparameters in reinforcement learning (RL) is critical, as these parameters significantly impact an agent's performance and learning efficiency. Dynamic adjustment of hyperparameters during the training process can significantly enhance both the performance and stability of learning. Population-based training (PBT) provides a method to achieve this by continuously tuning hyperparameters throughout the training. This ongoing adjustment enables models to adapt to different learning stages, resulting in faster convergence and overall improved performance. In this paper, we propose an enhancement to PBT by simultaneously utilizing both first- and second-order optimizers within a single population. We conducted a series of experiments using the TD3 algorithm across various MuJoCo environments. Our results, for the first time, empirically demonstrate the potential of incorporating second-order optimizers within PBT-based RL. Specifically, the combination of the K-FAC optimi
Authors
(none)
Tags
Stats
Related papers
- Generalized Population-based Training For Hyperparameter Optimization In Reinforcement Learning (2024)9.59
- Data Efficient Training For Reinforcement Learning With Adaptive Behavior Policy Sharing (2020)0.00
- A Hierarchical Two-tier Approach To Hyper-parameter Optimization In Reinforcement Learning (2019)0.00
- Automatic Tuning Of Hyper-parameters Of Reinforcement Learning Algorithms Using Bayesian Optimization With Behavioral Cloning (2021)0.00
- Phasic Diversity Optimization For Population-based Reinforcement Learning (2024)0.00
- Hyperparameter Tuning For Deep Reinforcement Learning Applications (2022)0.00
- Coevolving With The Other You: Fine-tuning LLM With Sequential Cooperative Multi-agent Reinforcement Learning (2024)5.24
- Sample-efficient Automated Deep Reinforcement Learning (2020)0.00