The hierarchical barycenter: conditional probability simulation with structured and unobserved covariates
Abstract
This paper presents a new method for conditional probability density simulation. The method is design to work with unstructured data set when data are not characterized by the same covariates yet share common information. Specific examples considered in the text are relative to two main classes: homogeneous data characterized by samples with missing value for the covariates and data set divided in two or more groups characterized by covariates that are only partially overlapping. The methodology is based on the mathematical theory of optimal transport extending the barycenter problem to the newly defined hierarchical barycenter problem. A newly, data driven, numerical procedure for the solution of the hierarchical barycenter problem is proposed and its advantages, over the use of classical barycenter, are illustrated on synthetic and real world data sets.