Learning Mdps From Features: Predict-then-optimize For Sequential Decision Problems By Reinforcement Learning
2021 Β· Kai Wang, Sanket Shah, Haipeng Chen, et al.
Abstract
In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it
Authors
(none)
Tags
Stats
Related papers
- Parameterized Mdps And Reinforcement Learning Problems -- A Maximum Entropy Principle Based Framework (2020)8.60
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- From Reinforcement Learning To Optimal Control: A Unified Framework For Sequential Decisions (2019)0.00
- Provably Efficient Ucb-type Algorithms For Learning Predictive State Representations (2023)0.00
- Towards An Adaptable And Generalizable Optimization Engine In Decision And Control: A Meta Reinforcement Learning Approach (2024)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- Goal-oriented Inference Of Environment From Redundant Observations (2023)3.58
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58