STOR Colloquium: David Dunson, Duke University
Probabilistic modeling of big table and networks
In many applications, data consist of high-dimensional complex and highly-structured discrete data. Our focus here is on high-dimensional unordered categorical data, which arise in epidemiology, social surveys and brain connectomics. In the first part of the talk, I will focus on data that can be structured as a multiway contingency table but that otherwise have no obvious structure a priori. For such problems, we rely on probabilistic tensor factorizations, introducing new classes of factorizations, discussing relationships with sparse log-linear models, sketching theory on rates of convergence, and considering applications in social science surveys and genomics. In the second part of the talk, I focus on the case in which the categorical data consist of indicators of connections between pairs of nodes in a network, motivated in particular by brain connectomic studies. The probability distribution for such network-valued random variables can be conveniently represented via a hierarchical latent space representation. We propose a Bayesian approach to inference and show exciting results in performing inferences on differences in brain structure with phenotypes.