Stepsize Learning For Policy Gradient Methods In Contextual Markov Decision Processes
2023 Β· Luca Sabbioni, Francesco Corda, Marcello Restelli
Abstract
Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance, and tend to struggle when asked to accomplish a series of heterogeneous tasks. In particular, the selection of the step size has a crucial impact on their ability to learn a highly performing policy, affecting the speed and the stability of the training process, and often being the main culprit for poor results. In this paper, we tackle these issues with a Meta Reinforcement Learning approach, by introducing a new formulation, known as meta-MDP, that can be used to solve any hyperparameter selection problem in RL with contextual processes. After providing a theoretical Lipschitz bound to the difference of performance in different tasks, we adopt the proposed framework to train a batch RL algo
Authors
(none)
Tags
Stats
Related papers
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Efficient Off-policy Meta-reinforcement Learning Via Probabilistic Context Variables (2019)0.00
- Metatrace Actor-critic: Online Step-size Tuning By Meta-gradient Descent For Reinforcement Learning Control (2018)0.00
- Learning Deterministic Policies With Policy Gradients In Constrained Markov Decision Processes (2025)0.00
- Guided Meta-policy Search (2019)0.00
- Learning To Explore With Meta-policy Gradient (2018)0.00
- Double Meta-learning For Data Efficient Policy Optimization In Non-stationary Environments (2020)0.00
- Multi-timescale Ensemble Q-learning For Markov Decision Process Policy Optimization (2024)6.34