Ph.D. Defense: Hyo Young Choi
The Department of
Statistics and Operations Research
The University of North Carolina at Chapel Hill
Ph.D. Thesis Defense
Tuesday, August 14th, 2018
130 Hanes Hall
Hyo Young Choi
Scissor for finding outliers in RNA-seq
(Under the direction of Dr. Marron and Dr. Hayes)
The impressive progress of high-throughput technologies has provided many interesting modern data types, which has tremendously increased the demand for Statistics. RNA-seq, in particular, allows a rich characterization of the genome with many exciting applications.
In this presentation, we address several statistical challenges in RNA-seq data especially characterized by high dimensionality with the goal of detecting outliers in RNA-seq. The first part of the presentation concerns the issue of high dimensional outliers which are challenging to distinguish from inliers due to the special structure of high dimensional space. In particular, due to its sparse structure, classical outlier detection methods such as distance-based or density-based approaches do not work well for high dimensional data. We introduce a new notion of high dimensional outliers that embraces various types and provides deep insights into understanding the behavior of these outliers. Using this new framework, we then explore the PCA subspace consistency and strong inconsistency under several asymptotic regimes. In the second part, we describe Scissor, a novel approach to unsupervised screening of a variety of shape changes that are possibly associated with important genetic events. Using the theoretical results from the first part, we propose a two-step procedure that identifies global and local shape changes in RNA-seq as well as characterizes underlying outlying structure. From the in-depth analysis at frequently mutated tumor suppressor genes, Scissor identifies novel shape changes which appear to be associated with new splicing variants that were missing from the current variant callers.