Non-parametric Stochastic Policy Gradient With Strategic Retreat For Non-stationary Environment
2022 Β· Apan Dastider, Mingjie Lin
Abstract
In modern robotics, effectively computing optimal control policies under dynamically varying environments poses substantial challenges to the off-the-shelf parametric policy gradient methods, such as the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic policy gradient (TD3). In this paper, we propose a systematic methodology to dynamically learn a sequence of optimal control policies non-parametrically, while autonomously adapting with the constantly changing environment dynamics. Specifically, our non-parametric kernel-based methodology embeds a policy distribution as the features in a non-decreasing Euclidean space, therefore allowing its search space to be defined as a very high (possible infinite) dimensional RKHS (Reproducing Kernel Hilbert Space). Moreover, by leveraging the similarity metric computed in RKHS, we augmented our non-parametric learning with the technique of AdaptiveH- adaptively selecting a time-frame window of finishing the optimal par
Authors
(none)
Tags
Stats
Related papers
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Sample Complexity Of Estimating The Policy Gradient For Nearly Deterministic Dynamical Systems (2019)0.00
- Improved Exploration Through Latent Trajectory Optimization In Deep Deterministic Policy Gradient (2019)0.00
- Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control In Computationally Complex Environments (2019)0.00
- Smoothing Policies And Safe Policy Gradients (2019)7.50
- Policy Search By Target Distribution Learning For Continuous Control (2019)3.58
- Policy Gradient For Continuing Tasks In Non-stationary Markov Decision Processes (2020)0.00
- Extremum-seeking Action Selection For Accelerating Policy Optimization (2024)0.00