Skip to main content

Dr. Jan Hannig and Dr. Steve Marron received NSF grant on “Statistical approaches to big data analytics”

February 17, 2017

Dr. Jan Hannig and Dr. Steve Marron received NSF grant on “Statistical approaches to big data analytics”

February 17, 2017

The major challenges to be studied in this grant include Data Integration, Data Heterogeneity and Parallelization.  Data Integration is a recently understood need for combining widely differing types of measurements made on a common set of subjects. For example, in cancer research, common measurements in modern Big Data sets include gene expression, copy number, mutations, methylation and protein expression.  Deep new statistical methods will be developed which focus on central scientific issues such as how the various measurements interact with each other, and simultaneously on which aspects operate in an independent manner.  Data Heterogeneity addresses a different issue which is also critical in cancer research.  In that case, current efforts to boost sample sizes (essential to deeper scientific insights) involve multiple laboratories combining their data.  A whole new conceptual model for understanding the bias-oriented challenges presented by this, plus the foundations for the development of new analytical methods that are robust against such effects, will be developed here.

Dr. Shankar Bhamidi and Dr. Andrew Nobel received NSF grant on “Iterative Testing Procedures and High-dimensional Scaling Limits of Extremal Random Structures”

February 17, 2017

Dr. Shankar Bhamidi and Dr. Andrew Nobel received NSF grant on “Iterative Testing Procedures and High-dimensional Scaling Limits of Extremal Random Structures”

February 17, 2017

Over the past ten years there has been a great deal of work in the statistics community devoted to the problem of testing and estimating associations, in particular correlations, between variables in high dimensional data sets. By definition, correlations capture pairwise relationships between variables, and there is a close formal relationship between the statistical analysis of correlations and the statistical analysis of networks. The statistical activity surrounding inference concerning correlations has been motivated in large part by the increasing use and importance of networks in a variety of fields, including economics, brain mapping, genomics and biomedicine. Networks of proteins associated with a disease can point the way towards potential drug interventions; known networks may serve as inputs for predictive models of survival or response to therapy in breast cancer and other diseases. Concurrent with this growth in statistical methodology, recent developments in the fields of probabilistic combinatorics and machine learning have significantly advanced our understanding of discrete random structures that capture the association of high-dimensional objects. Although these powerful theoretical techniques can be brought directly to bear on a number of the correlation based problems considered in the statistical community, to date no such cross-fertilization has taken place.
The proposed research has several complementary components. The first component is development, of an iterative testing procedure that identifies self-associated sets of vertices in a graph, and self-associated sets of variables in a high dimensional data set. Within the framework of the iterative testing procedure we will develop computationally efficient methods for several applied problems: mining of block correlation differences in two sample studies, and identifying groups of mutually correlated variables in studies where each sample is assessed with two or more measurement platforms. As a special case of the latter problem, we will develop tools to enhance the power of genomic studies that link local genetic variation to global changes in gene expression. Development and application of the methods will be carried out in cooperation with researchers in genomics, biomedicine, and sociology at UNC, with whom the PI and co-PI have long standing collaborations. The second component of the proposed research is to adapt and extend existing techniques in probabilistic combinatorics to provide supporting theory for the iterative testing procedure, and to address broader statistical questions concerning the testing and estimation of correlations.

Dr. Kai Zhang received NSF grant on “Geometric Perspectives on the Correlation”

February 16, 2017

Dr. Kai Zhang received NSF grant on “Geometric Perspectives on the Correlation”

February 16, 2017

In modern statistical analysis, datasets often contain a large number of variables with complicated dependence structures. This situation is especially common in important problems in economics, engineering, finance, genetics, genomics, neurosciences, etc. One of the most important measures on the dependence between variables is the correlation coefficient, which describes their linear dependence. In the new paradigm described above, understanding the correlation and the behavior of correlated variables is a crucial problem and prompts statisticians to develop new theories and methods. Motivated by this challenge, the PI proposes to study the correlation through novel geometric perspectives. The overall objective is (1) to develop useful theories and methods on the correlation and (2) to build a stronger connection between geometry and statistics. The PI anticipates the achievement of his goals through an integration of research and education plans. 

The research agenda is to systematically investigate three fundamental aspects of the correlation: (1) the magnitude and distribution of the maximal spurious sample correlation; (2) the detection of a low-rank correlation structure; and (3) the probability measure over the space of correlation matrices. In these studies, the novel integration of statistical and geometric insights characterizes the proposed solutions and facilitates precise probability statements. Completion of the proposed research will provide a comprehensive understanding of the correlation and a stronger connection between geometry and statistics. The PI also has comprehensive plans on educating graduate and undergraduate students and on disseminating the research results to the broader scientific community.

Dr. Nilay Argon and Dr. Serhan Ziya received NSF grant on “Distribution of Patients to Medical Facilities in Mass-Casualty Events”

February 16, 2017

Dr. Nilay Argon and Dr. Serhan Ziya received NSF grant on “Distribution of Patients to Medical Facilities in Mass-Casualty Events”

February 16, 2017

Mass-casualty events such as terrorist attacks and natural disasters can affect hundreds to thousands of people and place significant burdens on emergency response systems for unpredicted periods of time. During these events, the emergency response management faces several complex operational decisions under time pressure and sometimes security and safety concerns. One fundamental decision is how to distribute casualties from the affected areas to multiple medical facilities that differ in capacity, specialty, and distance. Currently, this decision is left to the emergency transport officer in civilian settings and to battlefield commanders during military operations. Using mathematical modeling and analysis in conjunction with medical expertise, this project will build knowledge and decision tools to make casualty distribution more efficiently and objective. This multi-disciplinary project bringing together operations researchers and emergency physicians, will benefit society directly by facilitating effective casualty distribution during disasters. It will also significantly contribute to the education of a diverse group of students from the operations research, public health, and medical fields.

In its most general form, casualty-distribution problem is a stochastic sequential decision making problem that includes various parameters and variables such as number of casualties at each location; number of emergency vehicles; capacity, capability, and congestion levels of each hospital; travel time between locations and hospitals; and condition of travel routes. The first phase of the project involves identifying the most fundamental tradeoffs underlying this complex decision-making problem and formulating separate models for each. These models will then be analyzed by means of exact methods such as sample-path analysis and Markov decision processes to obtain insights about the characteristics of optimal decision rules. In the second phase of the project, approximate approaches such as fluid models and Lagrangian relaxations will be used to develop heuristic policies. In the final phase, an extensive simulation study will be conducted to test the proposed principles and decision rules in more realistic settings using data from literature and the 2010 National Hospital Ambulatory Medical Care Survey.  The mathematical models developed for this project can equivalently be seen as queueing models with dynamic routing. Hence, this project also contributes to the operations research literature by introducing and studying a new class of queue-routing problems, where the travel to queues takes time and possibly requires a scarce resource.

Dr. Quoc Tran-Dinh received NSF grant on “Efficient methods for large scale self concordant convex minimization”

February 15, 2017

Dr. Quoc Tran-Dinh received NSF grant on “Efficient methods for large scale self concordant convex minimization”

February 15, 2017

Recent progress in modern convex optimization provides a powerful tool for scientific discovery. In theory, many classical convex optimization models have well-understood structures, and hence can efficiently be solved by state-of-the-art algorithms. In practice, however, modern applications present a host of increasingly larger-scale and nonsmooth models that can render these methods impractical. Fortunately, recent advances in convex optimization offer a surprising new angle to fundamentally re-examine the theory and practice of large-scale problems in a unified fashion. This project focuses on exploiting and generalizing a prominent concept so-called self-concordance to develop new efficient convex optimization techniques to attack two classes of large-scale convex optimization problems, and will be integrated into three interdisciplinary work packages (WPs).

WP1. Composite self-concordant convex optimization: While existing convex optimization methods essentially rely on the Lipschitz gradient assumption, the PI instead focuses on the self- concordance structure and its generalizations. Such a concept is key to the theory of interior-point methods, but has remained unexploited in composite minimization. Grounded in this structure, the PI will develop novel and provable convex optimization algorithms for solving several subclasses of large-scale composite convex problems.

WP2. Constrained convex optimization involving self-concordant barriers: Various constrained convex applications are integrated with a self-concordant barrier structure, while other convex constraints often have a ‘‘simple’’ structure. Existing general-purpose convex algorithms solve these problems by mainly employing either a standard interior-point method or an augmented Lagrangian framework. The PI alternatively concentrates on exploiting special structures of these problems and combining them with both the interior-point idea and the proximal framework to develop new and scalable algorithms equipped with a rigorous convergence guarantee, while offering a parallel and distributed implementation.

WP3. Implementation and applications: This WP aims at investigating the implementation aspects of the PI’s algorithms and upgrading his SCOPT solver. The theory and methods developed in WP1 and WP2 will be validated through three concrete applications: Poisson imaging, graph learning, and max-cut-type problems. While these applications are different, their underlying convex formulation possesses the following features: (i) it has non-Lipschitz gradient objectives but features a self-concordance structure, and (ii) the problem dimension can easily reach several billions of variables.

Dr. Shankar Bhamidi received NSF grant on “Dynamic network models on entrance boundary and continuum scaling limits, condensation phenomena and probabilistic combinatorial optimization”

February 15, 2017

Dr. Shankar Bhamidi received NSF grant on “Dynamic network models on entrance boundary and continuum scaling limits, condensation phenomena and probabilistic combinatorial optimization”

February 15, 2017

The last few years have witnessed an explosion in the amount of empirical data on real networks motivating an array of mathematical models for the evolution of such networks. Examples range from biological networks (brain networks of interacting neurons), information transmission (Internet), transportation, social networks and swarm intelligence and the evolution of self-organized behavior through the interactions of simple agents. This has stimulated vigorous activity in a multitude of fields, including biology, statistical physics, statistics, mathematics and computer science to understand these models and quantify their predictions and relevance to real systems. The aim of this grant is to develop systematic mathematical theory to understand Dynamic networks: systems that evolve over time through probabilistic rules.  Using models motivated by colloidal chemistry, we will developing robust mathematical techniques to understand how macroscopic connectivity in the network arises via microscopic interactions between agents in the network. This is of importance in areas such as epidemic modeling and social networks wherein core questions of interest include if a disease or a message is able to reach a significant fraction of the population of interest. Mathematical techniques used to understand such questions have unexpected connections to combinatorial optimization where one is interested in designing optimal networks between individuals. The techniques developed in the grant in particular will be used to understand asymptotics in the large network limit for one of the most fundamental of such objects, the Minimal spanning tree. Lastly we will explore meta-heuristics including swarm optimization algorithms (inspired by the collective behavior of simple individuals such as ants) and their ability to solve hard optimization problems via probabilistic interaction rules through stigmergy (where the network of interacting agents changes the underlying environment which then effects the interaction of the particles). An important component of the grant is involvement of students at all levels including the development of undergraduate research seminars and research projects. 

The nature of emergence of the giant component and the critical scaling window in random graph models has stimulated enormous amount of work in probabilistic combinatorics since the middle of the last century; most techniques deal with gross features such as maximal component sizes. Understanding the metric structure of these components in inhomogeneous random graphs has been particularly daunting, despite being being the key to understanding more complicated strong disorder systems. The proposal develops a unified set of tools through dynamic encoding of network models of interest and tracking the entire trajectory of evolution of these systems in order to understand the metric scaling of the internal structure of maximal components in the critical regime. We aim to show convergence to continuum limiting objects based on tilted inhomogeneous continuum random trees and in particular prove universality for many of the major families of random graph models. Connections between these questions and structural properties of dynamic constructions of random graph models, in particular scaling exponents of key susceptibility functions in the barely subcritical regime will be studied. The relation between metric structure of components in the critical regime and the entrance boundary of Markov processes such as the multiplicative coalescent will be explored.  The entire program is the first step in understanding scaling limits of fundamental models in strong disorder including the minimal spanning tree on the giant component.  These models have spawned a wide array of universality conjectures from statistical physics.  In a related direction, we will study optimization algorithms and meta-heuristics inspired by reinforcing interacting particle systems and stigmergy and their relationship to key probabilistic systems such as reinforced random walks and stochastic dynamical systems.  The aim in this direction is to provide qualitative insights and quantitative predictions on hard models in probabilistic combinatorial optimization such as the traveling salesman problem.  

Dr. Yufeng Liu received NSF grant on “Foundations of Nonconvex Problems in BigData Science and Engineering: Models, Algorithms, and Analysis”

February 15, 2017

Dr. Yufeng Liu received NSF grant on “Foundations of Nonconvex Problems in BigData Science and Engineering: Models, Algorithms, and Analysis”

February 15, 2017

In today’s digital world, huge amounts of data, i.e., big data, can be found in almost every aspect of scientific research and human activity. These data need to be managed effectively for reliable prediction and inference to improve decision making. Statistical learning is an emergent scientific discipline wherein mathematical modeling, computational algorithms, and statistical analysis are jointly employed to address these challenging data management problems. Invariably, quantitative criteria need to be introduced for the overall learning process in order to gauge the quality of the solutions obtained. This research focuses on two important criteria: data fitness and sparsity representation of the underlying learning model. Potential applications of the results can be found in computational statistics, compressed sensing, imaging, machine learning, bio-informatics, portfolio selection, and decision making under uncertainty, among many areas involving big data.

Till now, convex optimization has been the dominant methodology for statistical learning in which the two criteria employed are expressed by convex functions either to be optimized and/or set as constraints of the variables being sought. Recently, non-convex functions of the difference-of-convex (DC) type and the difference-of-convex algorithm (DCA) have been shown to yield superior results in many contexts and serve as the motivation for this project. The goal is to develop a solid foundation and a unified framework to address many fundamental issues in big data problems in which non-convexity and non-differentiability are present in the optimization problems to be solved. These two non-standard features in computational statistical learning are challenging and their rigorous treatment requires the fusion of expertise from different domains of mathematical sciences. Technical issues to be investigated will cover the optimality, sparsity, and statistical properties of computable solutions to the non-convex, non-smooth optimization problems arising from statistical learning and its many applications. Novel algorithms will be developed and tested first on synthetic data sets for preliminary experimentation and then on publicly available data sets for realism; comparisons will be made among different formulations of the learning problems.

Dr. Andrew Nobel received NSF collaboration grant on “Random dynamical systems and limit theorems for optimal tracking”

February 15, 2017

Dr. Andrew Nobel received NSF collaboration grant on “Random dynamical systems and limit theorems for optimal tracking”

February 15, 2017

The proposed research investigates the structure of families of dynamical systems from two complementary points of view. The first point of view is the “forward problem,” in which one chooses a system at random from an ensemble of systems and describes its properties with high probability. The second point of view is the ”inverse problem,” in which one makes observations from an unknown system within a known family and attempts to recover some information about the unknown system from the observations. The research has three primary aims.
Aim 1: To describe the likely structural properties of dynamical systems that evolve according to randomly chosen rules.
Aim 2: To establish a rigorous theoretical foundation for the analysis of optimization problems and related statistical inference methodology for dynamical systems.
Aim 3: To develop a relative version of the thermodynamic formalism and investigate its connections to Bayesian inference.

Authors/roles: Kevin McGoff (PI), Sayan Mukherjee and Andrew Nobel (co-PIs)

Hannan gift

February 8, 2017

Hannan gift

February 8, 2017

Department of Statistics and Operations Research received a generous gift from the estate of Dr. James Francis Hannan and Ms. Bettie Creighton Hannan that will support our recruitment and retention of top graduate students and faculty. We are grateful to the family of Dr. James Hannan and Ms. Bettie Hannan.  Dr. Hannan received his Ph.D. from the Department of Statistics at UNC in 1953 under the direction of Dr. Herbert Robbins. Dr. Hannan reminisced fondly about his time in Chapel Hill as a graduate student in a very interesting interview that appeared in Statistical Science (https://projecteuclid.org/euclid.ss/1280841737).

Dr. Yufeng Liu given Breiman award

January 23, 2017

Dr. Yufeng Liu given Breiman award

January 23, 2017

Professor Yufeng Liu was selected for the Inaugural Leo Breiman Junior Awards by the Section on Statistical Learning and Data Science (SLDS) at ASA, for great impacts of his work in the area of SLDS. 

As an award recipient, Yufeng is invited to give an invited lecture with the senior awardee Grace Wahba and junior co-awardee Ming Yuan at JSM 2017 in Baltimore, Maryland.