- This event has passed.
PhD Defense: Zichao Li
30 Mar @ 12:30 pm - 2:30 pm
PhD Defense: Zichao Li
30 Mar @ 12:30 pm – 2:30 pmTitle: Clustering and Classification with Feature Selection for High-Dimensional Data
Abstract: In this dissertation, we discuss several methods for clustering and classification with feature selection for high-dimensional data. In the first part, we focus on the problem of biclustering, which is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk, and introduce a novel algorithm that alternately applies an adapted version of the k-means clustering algorithm between columns and rows. In the second part, we develop a new classification method based on nearest centroid, using disjoint sets of features. We present a simple algorithm based on adapted k-means clustering that can find the subsets of features used in our method and extend the algorithm to perform feature selection. In the third part, we study the problem of classification with feature selection, where the features are selected iteratively in a supervised way to optimize predictive performance. We propose to use beam search to perform feature selection, which can be viewed as a generalization of forward selection. In all parts of the dissertation, we evaluate and compare the performance of our methods to other related methods on both simulated data and real-world datasets.