Ph.D. Defense: Yifan Cui
Ph.D. Thesis Defense
Thursday, April 5th, 2018
103 New West
Tree-based survival models and precision medicine
(Under the direction of Dr. Michael R. Kosorok and Dr. Jan Hannig)
Random forests have become one of the most popular machine learning tools in recent years. The main advantage of tree- and forest-based models is their nonparametric nature. My dissertation mainly focuses on a particular type of tree and forest model, in which the outcomes are right censored survival data. Censored survival data are frequently seen in biomedical studies when the true clinical outcome may not be directly observable due to early dropout or other reasons.
We first carry out a comprehensive analysis of survival random forest and tree models and show the consistency of these popular machine learning models by developing a general theoretical framework. Our results significantly improve the current understanding of such models and this is the first consistency result of tree- and forest-based regression estimator for censored outcomes under high-dimensional settings. In particular, the consistency results are derived through analyzing the splitting rules and establishing an adaptive concentration bound of the variance component, which may also shed light on the theoretical analysis of other random forest models.
In the second part, motivated by tree-based survival models, we propose a fiducial approach to provide pointwise and curvewise confidence intervals for the survival functions. On each terminal node, the estimation is essentially a small sample and maybe heavy censoring problem. Most of the asymptotic methods of estimating confidence intervals have coverage problems in many scenarios. The proposed fiducial based pointwise confidence intervals maintain coverage in these situations. Furthermore, the average length of the proposed pointwise confidence intervals is often shorter than the length of competing methods that maintain coverage.
In the third topic, we show one application of tree-based survival models in precision medicine. We extend the outcome weighted learning to right censored survival data without requiring either inverse probability of censoring weighting or semi-parametric modeling of the censoring and failure times. To accomplish this, we take advantage of the tree based approach to nonparametrically impute the survival time in two different ways. We also illustrate the proposed method on a phase III clinical trial of non-small cell lung cancer.