Optimal Scheduling Of Entropy Regulariser For Continuous-time Linear-quadratic Reinforcement Learning
2022 Β· Lukasz Szpruch, Tanut Treetanthiploet, Yufei Zhang
Abstract
This work uses the entropy-regularised relaxed stochastic control perspective as a principled framework for designing reinforcement learning (RL) algorithms. Herein agent interacts with the environment by generating noisy controls distributed according to the optimal relaxed policy. The noisy policies on the one hand, explore the space and hence facilitate learning but, on the other hand, introduce bias by assigning a positive probability to non-optimal actions. This exploration-exploitation trade-off is determined by the strength of entropy regularisation. We study algorithms resulting from two entropy regularisation formulations: the exploratory control approach, where entropy is added to the cost objective, and the proximal policy update approach, where entropy penalises policy divergence between consecutive episodes. We focus on the finite horizon continuous-time linear-quadratic (LQ) RL problem, where a linear dynamics with unknown drift coefficients is controlled subject to quadr
Authors
(none)
Tags
Stats
Related papers
- Exploration Versus Exploitation In Reinforcement Learning: A Stochastic Control Approach (2018)9.76
- Fast Policy Learning For Linear Quadratic Control With Entropy Regularization (2023)0.00
- A Comparative Theoretical Analysis Of Entropy Control Methods In Reinforcement Learning (2026)0.00
- Entropy Regularized Reinforcement Learning Using Large Deviation Theory (2021)6.34
- Sublinear Regret For A Class Of Continuous-time Linear-quadratic Reinforcement Learning Problems (2024)0.00
- Marginalized State Distribution Entropy Regularization In Policy Optimization (2019)0.00
- Continuous-time Risk-sensitive Reinforcement Learning Via Quadratic Variation Penalty (2024)0.00
- Regularization Matters In Policy Optimization (2019)2.68