Globally Convergent Policy Search Over Dynamic Filters For Output Estimation
2022 Β· Jack Umenberger, Max Simchowitz, Juan C. Perdomo, et al.
Abstract
We introduce the first direct policy search algorithm which provably converges to the globally optimal \(\textit\{dynamic\}\) filter for the classical problem of predicting the outputs of a linear dynamical system, given noisy, partial observations. Despite the ubiquity of partial observability in practice, theoretical guarantees for direct policy search algorithms, one of the backbones of modern reinforcement learning, have proven difficult to achieve. This is primarily due to the degeneracies which arise when optimizing over filters that maintain internal state. In this paper, we provide a new perspective on this challenging problem based on the notion of \(\textit\{informativity\}\), which intuitively requires that all components of a filter's internal state are representative of the true state of the underlying dynamical system. We show that informativity overcomes the aforementioned degeneracy. Specifically, we propose a \(\textit\{regularizer\}\) which explicitly enforces infor
Authors
(none)
Tags
Stats
Related papers
- Global Convergence Of Receding-horizon Policy Search In Learning Estimator Designs (2023)1.20
- Conservative Exploration For Policy Optimization Via Off-policy Policy Evaluation (2023)0.00
- Linear Convergence Of A Policy Gradient Method For Some Finite Horizon Continuous Time Control Problems (2022)0.00
- On The Optimization Landscape Of Dynamic Output Feedback: A Case Study For Linear Quadratic Regulator (2022)4.52
- Convergence Of Finite Memory Q-learning For Pomdps And Near Optimality Of Learned Policies Under Filter Stability (2021)0.00
- Convergence And Optimality Of Policy Gradient Methods In Weakly Smooth Settings (2021)3.58
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00
- Online And Lightweight Kernel-based Approximated Policy Iteration For Dynamic P-norm Linear Adaptive Filtering (2022)0.00