PHD Defense: Hang Yu
Sparse Machine Learning Methods for Prediction and Personalized Medicine
With growing interest to use black-box machine learning for complex data with many feature variables, it is critical to obtain a prediction model that only depends on a small set of features to maximize generalizability. Therefore, feature selection remains to be an important and challenging problem in modern applications. Most of existing methods for feature selection are based on either parametric or semiparametric models, so the resulting performance can severely suffer from model misspecification when high-order nonlinear interactions among the features are present. Thus, nonparametric feature selection for high-dimensional data is an important and challenging problem in statistics and machine learning fields. We propose a new framework to perform nonparametric feature selection for both regression and classification problems. Under this framework, we learn prediction functions through empirical risk minimization over a reproducing kernel Hilbert space (RKHS). The space is generated by a novel tensor product kernel which depends on a set of parameters that determine the importance of the features. We study the theoretical property of the kernel feature space and prove oracle selection property and Fisher consistency of our proposed method. Then we continue to apply the nonparametric feature selection framework for treatment decision making with high-dimensional data in personalized medicine field. With modification of the algorithms, the computation process becomes fast and stable. We also include simulation studies and real word applications in the work to demonstrate the superior performance of the proposed method compared to exiting methods.