Parametrising the Inhomogeneity Inducing Capacity of a Training Set, and its Impact on Supervised Learning
Abstract
We introduce parametrisation of that property of the available training dataset, that necessitates an inhomogeneous correlation structure for the function that is learnt as a model of the relationship between the pair of variables, observations of which comprise the considered training data. We refer to a parametrisation of this property of a given training set, as its ``inhomogeneity parameter''. It is easy to compute this parameter for small-to-large datasets, and we demonstrate such computation on multiple publicly-available datasets, while also demonstrating that conventional ``non-stationarity'' of data does not imply a non-zero inhomogeneity parameter of the dataset. We prove that - within the probabilistic Gaussian Process-based learning approach - a training set with a non-zero inhomogeneity parameter renders it imperative, that the process that is invoked to model the sought function, be non-stationary. Following the learning of a real-world multivariate function with such a Process, quality and reliability of predictions at test inputs, are demonstrated to be affected by the inhomogeneity parameter of the training data.