PHD Defense: Peiyao Wang
Title: Flexible Supervised Learning for Heterogeneous Data
Abstract: Data heterogeneity is a challenging problem in modern data analysis. In particular, many classical statistical methodologies may show inadequate performance on heterogeneous datasets because the key homogeneity assumption fails. In this dissertation, we develop several new regression techniques for data with heterogeneous population. In the first project, we propose a flexible local regression framework for data that can be grouped into several ordered subtypes. We define a new “progression score” that captures the progression of ordinal classes, and use the score to construct the local weights in a shrinkage varying-coefficient model. In the second and third projects, we study the classical regression problem for multi-group data with heterogeneous subpopulations. In this setting, a global model can be too restrictive because it ignores the data heterogeneity. Group-specific models fit each group separately, hence the joint information across different groups cannot be sufficiently captured. We propose two flexible models to simultaneously quantify the information jointly shared across groups and the information individual to each group. In particular, in both models, the response can be represented as a decomposition of heterogeneous and homogeneous terms. In our second project, this is driven by a factor decomposition of covariates. In our third project, this is achieved by a more general latent component regression setup. To demonstrate the effectiveness of our proposed models for heterogeneous data, numerical and theoretical studies are performed and compared. The Alzheimer’s Disease Neuroimaging Initiative data are analyzed to further illustrate the advantages of proposed models.