Convex Programs And Lyapunov Functions For Reinforcement Learning: A Unified Perspective On The Analysis Of Value-based Methods
2022 Β· Xingang Guo, Bin Hu
Abstract
Value-based methods play a fundamental role in Markov decision processes (MDPs) and reinforcement learning (RL). In this paper, we present a unified control-theoretic framework for analyzing valued-based methods such as value computation (VC), value iteration (VI), and temporal difference (TD) learning (with linear function approximation). Built upon an intrinsic connection between value-based methods and dynamic systems, we can directly use existing convex testing conditions in control theory to derive various convergence results for the aforementioned value-based methods. These testing conditions are convex programs in form of either linear programming (LP) or semidefinite programming (SDP), and can be solved to construct Lyapunov functions in a straightforward manner. Our analysis reveals some intriguing connections between feedback control systems and RL algorithms. It is our hope that such connections can inspire more work at the intersection of system/control theory and RL.
Authors
(none)
Tags
Stats
Related papers
- The Value-improvement Path: Towards Better Representations For Reinforcement Learning (2020)6.77
- On The Continuity And Smoothness Of The Value Function In Reinforcement Learning And Optimal Control (2024)0.00
- Orchestrated Value Mapping For Reinforcement Learning (2022)0.00
- On The Limited Representational Power Of Value Functions And Its Links To Statistical (in)efficiency (2024)0.00
- Disentangling Dynamics And Returns: Value Function Decomposition With Future Prediction (2019)0.00
- Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps (2023)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Foresee Then Evaluate: Decomposing Value Estimation With Latent Future Prediction (2021)3.58