- This event has passed.
PHD Defense: Xi Yang
25 Jun @ 1:00 pm - 2:30 pm
PHD Defense: Xi Yang
25 Jun @ 1:00 pm – 2:30 pmXi Yang
Machine Learning Methods in HDLSS Settings
During the exploration of high dimension-low-sample-size (HDLSS) data in different fields such as genetics, finance, computer science, etc, various machine learning methods have been developed. This dissertation includes the invention of novel methods and the improvement of current methods, which are evaluated using cancer genetics data.
The statistical significance of the difference between subgroups is a central question in the setting of HDLSS data. The Direction Projection Permutation (DiProPerm) hypothesis test provides an answer to this that is directly connected to a visual analysis of the data. However, under some circumstances, the DiProPerm test can be less powerful and accurate when measuring the significance of the test pairs. In this dissertation, we first introduce a new permutation method. This increases the power of the test in high signal situations. Furthermore, the simulated null test statistics tend to be more reasonable and uni-modal. Then, our theoretical analysis provides an adjustment to the inference for both permutation schemes. This enables us to exploit the improved power available. We also add confidence measures that reflect the Monte Carlo uncertainty in that test, which is seen to be very useful for the comparison of results across different contexts.
Another important goal of this dissertation is to understand the drivers of Angle-Based Joint and Individual Variation Explained (AJIVE). An important open problem is a statistical inference on the AJIVE loadings to determine which are significant features of the analysis. Jackstraw is a method that generally aims to find the statistically significant drivers associated with unobserved latent variables. In this dissertation, we develop a method based on similar ideas in the richer context of AJIVE to give a precise estimation.
Genetic data sets are used to evaluate the above-proposed machine learning methods, which also give results of independent interest to biologists.