Hidden Markov Model Estimation-based Q-learning For Partially Observable Markov Decision Process
2018 Β· Hyung-Jin Yoon, Donghwan Lee, Naira Hovakimyan
Abstract
The objective is to study an on-line Hidden Markov model (HMM) estimation-based Q-learning algorithm for partially observable Markov decision process (POMDP) on finite state and action sets. When the full state observation is available, Q-learning finds the optimal action-value function given the current action (Q function). However, Q-learning can perform poorly when the full state observation is not available. In this paper, we formulate the POMDP estimation into a HMM estimation problem and propose a recursive algorithm to estimate both the POMDP parameter and Q function concurrently. Also, we show that the POMDP estimation converges to a set of stationary points for the maximum likelihood estimate, and the Q function estimation converges to a fixed point that satisfies the Bellman optimality equation weighted on the invariant distribution of the state belief determined by the HMM estimation process.
Authors
(none)
Tags
Stats
Related papers
- Near-optimal Partially Observable Reinforcement Learning With Partial Online State Information (2023)0.00
- Sample-efficient Learning Of Pomdps With Multiple Observations In Hindsight (2023)0.00
- Efficient Learning Of Pomdps With Known Observation Model In Average-reward Setting (2024)0.00
- Model-based Learning Of Near-optimal Finite-window Policies In Pomdps (2026)0.00
- Computationally Efficient PAC RL In Pomdps With Latent Determinism And Conditional Embeddings (2022)0.00
- Proximal Reinforcement Learning: Efficient Off-policy Evaluation In Partially Observed Markov Decision Processes (2021)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00
- Provably Efficient Reinforcement Learning In Partially Observable Dynamical Systems (2022)0.00