An MRP Formulation For Supervised Learning: Generalized Temporal Difference Learning Models

Abstract

In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) following an unknown probability distribution. This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution. Theoretically, our analysis establishes connections between the solutions of linear TD learning and ordinary least squares (OLS). Under specific conditions -- particularly when the noise is correlated -- the TD solution serves as a more effective estimator than OLS. Furthermore, we show that when our algorithm is applied with many commonly used loss functions -- such as those found in generalized linear models -- it corresponds to the application of a novel and

An MRP Formulation For Supervised Learning: Generalized Temporal Difference Learning Models

Abstract

Authors

Tags

Stats

Related papers