Hotelling Lectures to be presented by Aad van der Vaart

February 23, 2017

Hotelling Lectures to be presented by Aad van der Vaart

February 23, 2017

This year’s Hotelling Lectures will be presented by Aad van der Vaart, Professor and Scientific Director of the Mathematical Institute of Leiden University. The subject will be Nonparametric Bayesian methods: frequentist analysis.

Aad van der Vaart studied mathematics, philosophy and psychology at the University of Leiden, and received a PhD in mathematics from this university in 1987. He held positions in College Station, Texas and Paris (not in Texas), held a Miller fellowship in Berkeley, and was visiting professor in Berkeley, Harvard and Seattle. Following a long connection to the Vrije Universiteit Amsterdam he is currently Professor of Stochastics at Leiden University. He is a member of the Royal Netherlands Academy of Arts and Sciences. His research has been funded by NWO, VU-USF, STW, CMSB, NDNS+, STAR, and most recently by the European Research Council (ERC Advanced Grant, 2012). He received the C.J. Kok prize in 1988, the van Dantzig award in 2000, and the NWO Spinoza Prize in 2015.

Aad van der Vaart’s research is in statistics and probability, as mathematical disciplines and in their applications to other sciences, with an emphasis on statistical models with large parameter spaces. He wrote books and lecture notes (on topics such as empirical processes, time series, stochastic integration, option pricing, statistical genetics, statistical learning, Bayesian nonparametrics), as well as research papers. See research page for more information.

Aad van der Vaart was associate editor of the Annals of Statistics, Statistica Neerlandica, Annales de l’Institut Henri Poincare, Probability Theory and Related Fields, co-editor of Statistics and Decisions, and is currently associate editor of Indagationes Mathematicae, Journal of Statistical Planning and Inference and ALEA. Keynote lectures include the Forum Lectures at the EMS 2009, the Le Cam lecture at the JSM 2009, invited address at the International Congress of Mathematicians in 2010, and a foundational lecture at the world meeting of International Society for Bayesian Analysis in 2012. He was program chair for the European Meeting of Statisticians 2006 in Oslo and BNP10 (2015) in Raleigh, and local chair of BNP9 and European Meeting of Statisticians 2015.

Among former administrative functions are president of the Netherlands Society for Statistics and Operations Research (2003-07), head of the Department of Mathematics of VU University (2002-06), chair of the European Council of the Bernoulli Society, scientific chair of the Stieltjes Institute, chair of the mathematics board of the Lorentz Centre, board member of the NDNS+ and STAR clusters, and council member of the International Statistical Institute. He is currently council member of the Institute of Mathematical Statistics and member of the steering committee of the Statistical Science master in Leiden. Since September 2015 he is Scientific Director of the Mathematical Institute of Leiden University.

The Hotelling Lectures are an annual event in the Department of Statistics & Operations Research at the University of North Carolina – Chapel Hill, honoring the memory of Professor Harold Hotelling, our first chairman. This year we are honored to have Professor Aad van der Vaart from Leiden University deliver our two Hotelling lectures which are open to the public. See our upcoming events for details about times and locations.

Dr. Kai Zhang received NSF BIGDATA grant on “Statistical Theory and Methods Beyond the Dimensionality Barrier”

February 18, 2017

Dr. Kai Zhang received NSF BIGDATA grant on “Statistical Theory and Methods Beyond the Dimensionality Barrier”

February 18, 2017

With recent advances in technology, it is now possible to measure and record significant numbers of features on a single individual. The volume, velocity, and variety, the “3Vs”, of Big Data pose significant challenges for modeling and analysis of these massive datasets. For example, to understand cancer at the genetic level, researchers need to detect rare and weak signals from thousands, or even millions, of candidate genetic markers obtained from a limited number of subjects. Existing methods typically assume that the number of subjects is very large, an assumption often violated in practice. The main goal of this project is to develop efficient methods for extremely large-dimensional, small sample size data. The methodological advances will be extremely valuable in addressing Big Data challenges in different areas such as medical research, bioinformatics, financial analysis, and astronomic image analysis. Efficient software packages and algorithms to implement the proposed methods will be developed and made publicly available.

The key innovative idea motivating this research is viewing a high-dimensional problem from a novel packing perspective, which allows the number of variables, p, to be arbitrarily large and the number of observations, n, to be finite. The proposed research will systematically investigate three fundamental problems under this “finite n, arbitrarily large p” paradigm: (1) asymptotic theory of spurious correlations, (2) fast detection of low-rank correlation structures, and (3) detection boundary and optimal testing procedures for detecting rare and weak signals. This research will transform the current asymptotic framework, transitioning from the regimes of “large n, small p” and “large n, larger p” to the regime of “finite n, arbitrarily large p”.

Dr. Andrew Nobel received NIH collaboration grant on “Multi-tissue and network models for next-generation eQTL studies”

February 17, 2017

Dr. Andrew Nobel received NIH collaboration grant on “Multi-tissue and network models for next-generation eQTL studies”

February 17, 2017

Expression quantitative trait loci (eQTL) studies seek to identify genomic variants that influence the expression of particular genes, and thereby influence higher level biological functions. The study of eQTLs has proven to be a useful tool in the study of biological pathways that underlie disease in human and other populations. Until recently, most eQTL analyses in humans were carried out using samples from blood. However, the Genotype-Tissue Expression (GTEx) consortium and other groups have recently begun assembling large data bases that include genomic variants and expression in multiple tissues. The research supported by the grant will address a number of statistical challenges that arise from these large, multi-tissue data sets. These include doing inference across many (20-30) tissues at the same time, identifying genomic control of genes that are located far away from a variant, and accurately quantifying and estimating the statistics effect of a genomic variant on the expression of a gene.

The supported research has four specific Aims: (1) to develop bipartite extensions of statistical tools from the analysis of networks that can enhance the identification of distal (trans) eQTLs; (2) to develop new statistical methods for fast eQTL association mapping that provide reliable estimates of effect size; (3) to extend an existing multi-tissue eQTL procedure developed by the PIs into a High-Tissue modeling platform capable of handling existing data sets with 20 to 30 tissues; and (4) to develop gene-based statistical models for eQTL analysis. Development of the proposed methods will be driven by recent, large-scale eQTL studies in which the PIs have played key roles. The resulting computational tools will address current, critical shortcomings in the analysis of these new data sets, and will have broad utility for the wider eQTL analysis community.

PIs Nobel and Wright have been members of the Analysis Working Group of the Genotype Tissue Expression (GTEx) Consortium since 2010. Their labs have contributed software and statistical analyses to the ongoing activities of Consortium, including a recent cover article in Science. The research in the grant will extend the PI’s existing work, and will explore new software and analysis methods that can be applied by biomedical researchers working in genomics and other fields.

Authors/roles: Andrew Nobel and Fred Wright, co-PIs

Dr. Jan Hannig and Dr. Steve Marron received NSF grant on “Statistical approaches to big data analytics”

February 17, 2017

Dr. Jan Hannig and Dr. Steve Marron received NSF grant on “Statistical approaches to big data analytics”

February 17, 2017

The major challenges to be studied in this grant include Data Integration, Data Heterogeneity and Parallelization. Data Integration is a recently understood need for combining widely differing types of measurements made on a common set of subjects. For example, in cancer research, common measurements in modern Big Data sets include gene expression, copy number, mutations, methylation and protein expression. Deep new statistical methods will be developed which focus on central scientific issues such as how the various measurements interact with each other, and simultaneously on which aspects operate in an independent manner. Data Heterogeneity addresses a different issue which is also critical in cancer research. In that case, current efforts to boost sample sizes (essential to deeper scientific insights) involve multiple laboratories combining their data. A whole new conceptual model for understanding the bias-oriented challenges presented by this, plus the foundations for the development of new analytical methods that are robust against such effects, will be developed here.

Dr. Shankar Bhamidi and Dr. Andrew Nobel received NSF grant on “Iterative Testing Procedures and High-dimensional Scaling Limits of Extremal Random Structures”

February 17, 2017

Dr. Shankar Bhamidi and Dr. Andrew Nobel received NSF grant on “Iterative Testing Procedures and High-dimensional Scaling Limits of Extremal Random Structures”

February 17, 2017

Over the past ten years there has been a great deal of work in the statistics community devoted to the problem of testing and estimating associations, in particular correlations, between variables in high dimensional data sets. By definition, correlations capture pairwise relationships between variables, and there is a close formal relationship between the statistical analysis of correlations and the statistical analysis of networks. The statistical activity surrounding inference concerning correlations has been motivated in large part by the increasing use and importance of networks in a variety of fields, including economics, brain mapping, genomics and biomedicine. Networks of proteins associated with a disease can point the way towards potential drug interventions; known networks may serve as inputs for predictive models of survival or response to therapy in breast cancer and other diseases. Concurrent with this growth in statistical methodology, recent developments in the fields of probabilistic combinatorics and machine learning have significantly advanced our understanding of discrete random structures that capture the association of high-dimensional objects. Although these powerful theoretical techniques can be brought directly to bear on a number of the correlation based problems considered in the statistical community, to date no such cross-fertilization has taken place.
The proposed research has several complementary components. The first component is development, of an iterative testing procedure that identifies self-associated sets of vertices in a graph, and self-associated sets of variables in a high dimensional data set. Within the framework of the iterative testing procedure we will develop computationally efficient methods for several applied problems: mining of block correlation differences in two sample studies, and identifying groups of mutually correlated variables in studies where each sample is assessed with two or more measurement platforms. As a special case of the latter problem, we will develop tools to enhance the power of genomic studies that link local genetic variation to global changes in gene expression. Development and application of the methods will be carried out in cooperation with researchers in genomics, biomedicine, and sociology at UNC, with whom the PI and co-PI have long standing collaborations. The second component of the proposed research is to adapt and extend existing techniques in probabilistic combinatorics to provide supporting theory for the iterative testing procedure, and to address broader statistical questions concerning the testing and estimation of correlations.

Dr. Kai Zhang received NSF grant on “Geometric Perspectives on the Correlation”

February 16, 2017

Dr. Kai Zhang received NSF grant on “Geometric Perspectives on the Correlation”

February 16, 2017

In modern statistical analysis, datasets often contain a large number of variables with complicated dependence structures. This situation is especially common in important problems in economics, engineering, finance, genetics, genomics, neurosciences, etc. One of the most important measures on the dependence between variables is the correlation coefficient, which describes their linear dependence. In the new paradigm described above, understanding the correlation and the behavior of correlated variables is a crucial problem and prompts statisticians to develop new theories and methods. Motivated by this challenge, the PI proposes to study the correlation through novel geometric perspectives. The overall objective is (1) to develop useful theories and methods on the correlation and (2) to build a stronger connection between geometry and statistics. The PI anticipates the achievement of his goals through an integration of research and education plans.

The research agenda is to systematically investigate three fundamental aspects of the correlation: (1) the magnitude and distribution of the maximal spurious sample correlation; (2) the detection of a low-rank correlation structure; and (3) the probability measure over the space of correlation matrices. In these studies, the novel integration of statistical and geometric insights characterizes the proposed solutions and facilitates precise probability statements. Completion of the proposed research will provide a comprehensive understanding of the correlation and a stronger connection between geometry and statistics. The PI also has comprehensive plans on educating graduate and undergraduate students and on disseminating the research results to the broader scientific community.

Dr. Nilay Argon and Dr. Serhan Ziya received NSF grant on “Distribution of Patients to Medical Facilities in Mass-Casualty Events”

February 16, 2017

Dr. Nilay Argon and Dr. Serhan Ziya received NSF grant on “Distribution of Patients to Medical Facilities in Mass-Casualty Events”

February 16, 2017

Mass-casualty events such as terrorist attacks and natural disasters can affect hundreds to thousands of people and place significant burdens on emergency response systems for unpredicted periods of time. During these events, the emergency response management faces several complex operational decisions under time pressure and sometimes security and safety concerns. One fundamental decision is how to distribute casualties from the affected areas to multiple medical facilities that differ in capacity, specialty, and distance. Currently, this decision is left to the emergency transport officer in civilian settings and to battlefield commanders during military operations. Using mathematical modeling and analysis in conjunction with medical expertise, this project will build knowledge and decision tools to make casualty distribution more efficiently and objective. This multi-disciplinary project bringing together operations researchers and emergency physicians, will benefit society directly by facilitating effective casualty distribution during disasters. It will also significantly contribute to the education of a diverse group of students from the operations research, public health, and medical fields.

In its most general form, casualty-distribution problem is a stochastic sequential decision making problem that includes various parameters and variables such as number of casualties at each location; number of emergency vehicles; capacity, capability, and congestion levels of each hospital; travel time between locations and hospitals; and condition of travel routes. The first phase of the project involves identifying the most fundamental tradeoffs underlying this complex decision-making problem and formulating separate models for each. These models will then be analyzed by means of exact methods such as sample-path analysis and Markov decision processes to obtain insights about the characteristics of optimal decision rules. In the second phase of the project, approximate approaches such as fluid models and Lagrangian relaxations will be used to develop heuristic policies. In the final phase, an extensive simulation study will be conducted to test the proposed principles and decision rules in more realistic settings using data from literature and the 2010 National Hospital Ambulatory Medical Care Survey. The mathematical models developed for this project can equivalently be seen as queueing models with dynamic routing. Hence, this project also contributes to the operations research literature by introducing and studying a new class of queue-routing problems, where the travel to queues takes time and possibly requires a scarce resource.

Dr. Quoc Tran-Dinh received NSF grant on “Efficient methods for large scale self concordant convex minimization”

February 15, 2017

Dr. Quoc Tran-Dinh received NSF grant on “Efficient methods for large scale self concordant convex minimization”

February 15, 2017

Recent progress in modern convex optimization provides a powerful tool for scientific discovery. In theory, many classical convex optimization models have well-understood structures, and hence can efficiently be solved by state-of-the-art algorithms. In practice, however, modern applications present a host of increasingly larger-scale and nonsmooth models that can render these methods impractical. Fortunately, recent advances in convex optimization offer a surprising new angle to fundamentally re-examine the theory and practice of large-scale problems in a unified fashion. This project focuses on exploiting and generalizing a prominent concept so-called self-concordance to develop new efficient convex optimization techniques to attack two classes of large-scale convex optimization problems, and will be integrated into three interdisciplinary work packages (WPs).

WP1. Composite self-concordant convex optimization: While existing convex optimization methods essentially rely on the Lipschitz gradient assumption, the PI instead focuses on the self- concordance structure and its generalizations. Such a concept is key to the theory of interior-point methods, but has remained unexploited in composite minimization. Grounded in this structure, the PI will develop novel and provable convex optimization algorithms for solving several subclasses of large-scale composite convex problems.

WP2. Constrained convex optimization involving self-concordant barriers: Various constrained convex applications are integrated with a self-concordant barrier structure, while other convex constraints often have a ‘‘simple’’ structure. Existing general-purpose convex algorithms solve these problems by mainly employing either a standard interior-point method or an augmented Lagrangian framework. The PI alternatively concentrates on exploiting special structures of these problems and combining them with both the interior-point idea and the proximal framework to develop new and scalable algorithms equipped with a rigorous convergence guarantee, while offering a parallel and distributed implementation.

WP3. Implementation and applications: This WP aims at investigating the implementation aspects of the PI’s algorithms and upgrading his SCOPT solver. The theory and methods developed in WP1 and WP2 will be validated through three concrete applications: Poisson imaging, graph learning, and max-cut-type problems. While these applications are different, their underlying convex formulation possesses the following features: (i) it has non-Lipschitz gradient objectives but features a self-concordance structure, and (ii) the problem dimension can easily reach several billions of variables.

Dr. Shankar Bhamidi received NSF grant on “Dynamic network models on entrance boundary and continuum scaling limits, condensation phenomena and probabilistic combinatorial optimization”

February 15, 2017

Dr. Shankar Bhamidi received NSF grant on “Dynamic network models on entrance boundary and continuum scaling limits, condensation phenomena and probabilistic combinatorial optimization”

February 15, 2017

The last few years have witnessed an explosion in the amount of empirical data on real networks motivating an array of mathematical models for the evolution of such networks. Examples range from biological networks (brain networks of interacting neurons), information transmission (Internet), transportation, social networks and swarm intelligence and the evolution of self-organized behavior through the interactions of simple agents. This has stimulated vigorous activity in a multitude of fields, including biology, statistical physics, statistics, mathematics and computer science to understand these models and quantify their predictions and relevance to real systems. The aim of this grant is to develop systematic mathematical theory to understand Dynamic networks: systems that evolve over time through probabilistic rules. Using models motivated by colloidal chemistry, we will developing robust mathematical techniques to understand how macroscopic connectivity in the network arises via microscopic interactions between agents in the network. This is of importance in areas such as epidemic modeling and social networks wherein core questions of interest include if a disease or a message is able to reach a significant fraction of the population of interest. Mathematical techniques used to understand such questions have unexpected connections to combinatorial optimization where one is interested in designing optimal networks between individuals. The techniques developed in the grant in particular will be used to understand asymptotics in the large network limit for one of the most fundamental of such objects, the Minimal spanning tree. Lastly we will explore meta-heuristics including swarm optimization algorithms (inspired by the collective behavior of simple individuals such as ants) and their ability to solve hard optimization problems via probabilistic interaction rules through stigmergy (where the network of interacting agents changes the underlying environment which then effects the interaction of the particles). An important component of the grant is involvement of students at all levels including the development of undergraduate research seminars and research projects.

The nature of emergence of the giant component and the critical scaling window in random graph models has stimulated enormous amount of work in probabilistic combinatorics since the middle of the last century; most techniques deal with gross features such as maximal component sizes. Understanding the metric structure of these components in inhomogeneous random graphs has been particularly daunting, despite being being the key to understanding more complicated strong disorder systems. The proposal develops a unified set of tools through dynamic encoding of network models of interest and tracking the entire trajectory of evolution of these systems in order to understand the metric scaling of the internal structure of maximal components in the critical regime. We aim to show convergence to continuum limiting objects based on tilted inhomogeneous continuum random trees and in particular prove universality for many of the major families of random graph models. Connections between these questions and structural properties of dynamic constructions of random graph models, in particular scaling exponents of key susceptibility functions in the barely subcritical regime will be studied. The relation between metric structure of components in the critical regime and the entrance boundary of Markov processes such as the multiplicative coalescent will be explored. The entire program is the first step in understanding scaling limits of fundamental models in strong disorder including the minimal spanning tree on the giant component. These models have spawned a wide array of universality conjectures from statistical physics. In a related direction, we will study optimization algorithms and meta-heuristics inspired by reinforcing interacting particle systems and stigmergy and their relationship to key probabilistic systems such as reinforced random walks and stochastic dynamical systems. The aim in this direction is to provide qualitative insights and quantitative predictions on hard models in probabilistic combinatorial optimization such as the traveling salesman problem.

Dr. Yufeng Liu received NSF grant on “Foundations of Nonconvex Problems in BigData Science and Engineering: Models, Algorithms, and Analysis”

February 15, 2017

Dr. Yufeng Liu received NSF grant on “Foundations of Nonconvex Problems in BigData Science and Engineering: Models, Algorithms, and Analysis”

February 15, 2017

In today’s digital world, huge amounts of data, i.e., big data, can be found in almost every aspect of scientific research and human activity. These data need to be managed effectively for reliable prediction and inference to improve decision making. Statistical learning is an emergent scientific discipline wherein mathematical modeling, computational algorithms, and statistical analysis are jointly employed to address these challenging data management problems. Invariably, quantitative criteria need to be introduced for the overall learning process in order to gauge the quality of the solutions obtained. This research focuses on two important criteria: data fitness and sparsity representation of the underlying learning model. Potential applications of the results can be found in computational statistics, compressed sensing, imaging, machine learning, bio-informatics, portfolio selection, and decision making under uncertainty, among many areas involving big data.

Till now, convex optimization has been the dominant methodology for statistical learning in which the two criteria employed are expressed by convex functions either to be optimized and/or set as constraints of the variables being sought. Recently, non-convex functions of the difference-of-convex (DC) type and the difference-of-convex algorithm (DCA) have been shown to yield superior results in many contexts and serve as the motivation for this project. The goal is to develop a solid foundation and a unified framework to address many fundamental issues in big data problems in which non-convexity and non-differentiability are present in the optimization problems to be solved. These two non-standard features in computational statistical learning are challenging and their rigorous treatment requires the fusion of expertise from different domains of mathematical sciences. Technical issues to be investigated will cover the optimality, sparsity, and statistical properties of computable solutions to the non-convex, non-smooth optimization problems arising from statistical learning and its many applications. Novel algorithms will be developed and tested first on synthetic data sets for preliminary experimentation and then on publicly available data sets for realism; comparisons will be made among different formulations of the learning problems.

News Items

Hotelling Lectures to be presented by Aad van der Vaart

Dr. Kai Zhang received NSF BIGDATA grant on “Statistical Theory and Methods Beyond the Dimensionality Barrier”

Dr. Andrew Nobel received NIH collaboration grant on “Multi-tissue and network models for next-generation eQTL studies”

Dr. Jan Hannig and Dr. Steve Marron received NSF grant on “Statistical approaches to big data analytics”

Dr. Shankar Bhamidi and Dr. Andrew Nobel received NSF grant on “Iterative Testing Procedures and High-dimensional Scaling Limits of Extremal Random Structures”

Dr. Kai Zhang received NSF grant on “Geometric Perspectives on the Correlation”

Dr. Nilay Argon and Dr. Serhan Ziya received NSF grant on “Distribution of Patients to Medical Facilities in Mass-Casualty Events”

Dr. Quoc Tran-Dinh received NSF grant on “Efficient methods for large scale self concordant convex minimization”

Dr. Shankar Bhamidi received NSF grant on “Dynamic network models on entrance boundary and continuum scaling limits, condensation phenomena and probabilistic combinatorial optimization”

Dr. Yufeng Liu received NSF grant on “Foundations of Nonconvex Problems in BigData Science and Engineering: Models, Algorithms, and Analysis”