## Grad Student Seminar: Carson Mosso, Jonghwan Yoo

**Carson Mosso – Latent Association Mining in Binary Data**

We will introduce a new data mining method for binary valued data, called Latent Association Mining in Binary Data. The origin of this problem is in market basket analysis, where binary valued data is common, and typically falls under the branch of data mining called Association Rule Mining. However, the problem can be generalized by mining for correlation between variables in various types of datasets, e.g., text or gene expression data. First, we will introduce a latent variable model and define a new statistic called latent association. This statistic is similar to correlation, but better suited to our latent variable model. Then we will define the association structure that we are interested in mining and an iterative hypothesis testing algorithm to find this association structure. This talk will discuss both statistical theory and real data applications. Moreover, we will spend time introducing Association Rule Mining, to put the problem in its proper context, and discuss two related latent variable methods, Latent Dirichlet Allocation and Nonnegative Matrix Factorization.

**Jonghwan Yoo – ****Integrative Data Analysis on H&E Images and Methylation Data**

** **

Due to advances in technology, various types of data for a common set of subjects or samples have become available. For instance, genetic, __genomic__, __epigentic__, __neurocognitive__, clinical and image data can be collected for each subject. Each of these data provides shared or partly independent information represented in different ways. Integrating information from various data sources, therefore, is essential to gain a broad and deeper understanding of subjects. One effective approach is to use Angle-Based Joint and Individual Variation Explained (__AJIVE__). We apply AJIVE on H&E images and methylation data of a skin cancer dataset. A main challenge is the enormous size of the H&E images and this is tackled by a deep learning technique called automated feature extraction.