OCMDP: Observation-constrained Markov Decision Process
2024 Β· Taiyi Wang, Jianheng Liu, Bryan Lee, et al.
Abstract
In many practical applications, decision-making processes must balance the costs of acquiring information with the benefits it provides. Traditional control systems often assume full observability, an unrealistic assumption when observations are expensive. We tackle the challenge of simultaneously learning observation and control strategies in such cost-sensitive environments by introducing the Observation-Constrained Markov Decision Process (OCMDP), where the policy influences the observability of the true state. To manage the complexity arising from the combined observation and control actions, we develop an iterative, model-free deep reinforcement learning algorithm that separates the sensing and control components of the policy. This decomposition enables efficient learning in the expanded action space by focusing on when and what to observe, as well as determining optimal control actions, without requiring knowledge of the environment's dynamics. We validate our approach on a simu
Authors
(none)
Tags
Stats
Related papers
- Sequential Monte Carlo For Policy Optimization In Continuous Pomdps (2025)0.00
- Configurable Markov Decision Processes (2018)0.00
- Robust Anytime Learning Of Markov Decision Processes (2022)0.00
- Omega-regular Decision Processes (2023)0.00
- Model-based Exploration In Monitored Markov Decision Processes (2025)0.00
- Efficient Learning Of Pomdps With Known Observation Model In Average-reward Setting (2024)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Mdps With A State Sensing Cost (2025)0.00