- This event has passed.
Ph.D. Defense-Hui Shen
12 Jun @ 10:00 am - 1:00 pm
Ph.D. Defense-Hui Shen
12 Jun @ 10:00 am – 1:00 pmThe Department of
Statistics and Operations Research
The University of North Carolina at Chapel Hill
Ph.D. Thesis Defense
Friday, June 16, 2023
10:00 AM
130 Hanes Hall
Or
Zoom
Zoom link: https://unc.zoom.us/j/93038814316?pwd=dGFhV1NxOGFCT1lOUWdXdjhTbEllZz09
Meeting ID: 930 3881 4316
Passcode: 156509
Hui Shen
Consistency of Some Statistical Learning Techniques: Unsupervised Learning and Network Change Point
Under the direction of Shankar Bhamidi and Yufeng Liu
In statistics and machine learning, unsupervised learning techniques are popular for data exploration, including structure identification, clustering, and change point detection. In this dissertation, we address some unsupervised learning problems in the high-dimensional setting. In the first direction, we consider the problem of assessing the statistical significance of general unimodal clusters. We extend SigClust, an important existing method for evaluating significance of clustering to the setting of multidimensional scaling (MDS). In the second direction, we conduct a theoretical investigation into Lloyd’s algorithm, one of the most popular clustering algorithms widely applied in practice. We aim to improve the theoretical understanding of Lloyd’s algorithm, particularly in the context of applying dimension reduction to high-dimensional clusterable data. Our result is demonstrated to be useful in multiple applications, including spectral clustering in stochastic block models, and multidimensional scaling for sub-Gaussian mixture models. In the third direction, we study the network change-point detection problem, which is challenging due to the sparsity and high dimensionality of network data. We introduce a general class of Markovian network change-point models allowing flexible spatial and temporal dependence. To detect network change points, new CUSUM-type statistics based on static and evolutionary graph structure representations, including graph counts and sampled network motifs, are proposed.