Transfer Q-learning

Abstract

Time-inhomogeneous finite-horizon Markov decision processes (MDP) are frequently employed to model decision-making in dynamic treatment regimes and other statistical reinforcement learning (RL) scenarios. These fields, especially healthcare and business, often face challenges such as high-dimensional state spaces and time-inhomogeneity of the MDP process, compounded by insufficient sample availability which complicates informed decision-making. To overcome these challenges, we investigate knowledge transfer within time-inhomogeneous finite-horizon MDP by leveraging data from both a target RL task and several related source tasks. We have developed transfer learning (TL) algorithms that are adaptable for both batch and online \(Q\)-learning, integrating valuable insights from offline source studies. The proposed transfer \(Q\)-learning algorithm contains a novel \{\em re-targeting\} step that enables \{\em cross-stage transfer\} along multiple stages in an RL task, besides the usual \{\

Transfer Q-learning

Abstract

Authors

Tags

Stats

Related papers