PhD Defense: Jack Prothero
Modern data collection in bioinformatics and other big-data paradigms often incorporates traits derived from multiple different points of view of the observations. We call this data multi-view data or multi-block data. The field of data integration develops and applies new methods for studying multi-block data and identifying how different data blocks relate and differ. One major frontier in contemporary data integration research is methodology that can identify partially-shared structure between sub-collections of data blocks. This thesis presents our method for locating partially-shared structure among multi-block data: Data Integration Via Analysis of Subspaces (DIVAS). DIVAS combines new insights in angular subspace perturbation theory with recent developments in matrix signal processing and convex-concave optimization into one algorithm for parsing partially-shared structure.
An ever-present yet under-examined aspect of statistical analysis, integrative or otherwise, is data matrix centering. We find that additional forms of centering can produce novel modes of variation in functional data analysis (FDA) and data integration. We propose a unified framework and new terminology for centering operations. We clearly demonstrate the intuition behind and consequences of each centering choice with informative graphics. We also propose a new direction energy hypothesis test as part of a series of diagnostics for determining which choice of centering is best for a data set.
Both DIVAS and data matrix centering are illustrated throughout using multi-block data sets concerning cancer genomics and 20th century mortality.