Dr. Jan Hannig and Dr. Steve Marron received NSF grant on "Statistical approaches to big data analytics"
February 17, 2017The major challenges to be studied in this grant include Data Integration, Data Heterogeneity and Parallelization. Data Integration is a recently understood need for combining widely differing types of measurements made on a common set of subjects. For example, in cancer research, common measurements in modern Big Data sets include gene expression, copy number, mutations, methylation and protein expression. Deep new statistical methods will be developed which focus on central scientific issues such as how the various measurements interact with each other, and simultaneously on which aspects operate in an independent manner. Data Heterogeneity addresses a different issue which is also critical in cancer research. In that case, current efforts to boost sample sizes (essential to deeper scientific insights) involve multiple laboratories combining their data. A whole new conceptual model for understanding the bias-oriented challenges presented by this, plus the foundations for the development of new analytical methods that are robust against such effects, will be developed here.