- This event has passed.
Ph. D Defense Haixu Ma
25 Jan @ 11:00 am - 1:00 pm
Ph. D Defense Haixu Ma
25 Jan @ 11:00 am – 1:00 pmTuesday February 6, 2024
11:00 AM
Location: 115 Hanes Hall
Zoom Link for virtual attendance: https://unc.zoom.us/j/4570368947
Haixu Ma
Flexible Machine Learning and Reinforcement Learning in Decision Making
(Under the direction of Yufeng Liu and Donglin Zeng)
Abstract: Machine learning and Reinforcement Learning (RL) have received a lot of attentions in decision making problems. For individualized decision making, due to possible significant heterogeneity of treatment effects among individuals, decision makers aim to precisely tailor the treatment decision rules to different subgroups of individuals. In the first part of this dissertation, we propose several new approaches to estimate the optimal Individualized Treatment Rules (ITR) when there are many treatments available. For the first project, we introduce the group-structured ITR and propose GRoup Outcome Weighted Learning (GROWL) to estimate the latent structure in the treatment space and the optimal group-structured ITRs through a single optimization. Fisher consistency, the excess risk bound, and the convergence rate of the value function are established to provide a theoretical guarantee for GROWL. For the second project, we propose a novel adaptive fusion based method to cluster the treatments with similar treatment effects together and estimate the optimal ITR simultaneously with a single convex optimization. The problem is formulated as balancing loss + penalty terms with a tuning parameter, which allows the entire solution path of the treatment clustering process to be clearly visualized hierarchically. Recently, RL has demonstrated its ability to enhance the efficiency of fundamental problems requiring extensive computational resources. In the second part of this dissertation, we focus on using RL for efficient variable selection in big data. For the third project, we propose a novel approach, REinforcement learning for Variable Selection (REVS), within the Markov decision process framework. By prioritizing the long-term variable selection accuracy, we propose a dynamic policy to adjust the candidate important variable set, guiding it toward convergence to the true variable set. To enhance computational efficiency, we present an online policy iteration algorithm integrated with temporal difference learning for sequential policy improvement. The REVS is shown to have accurate variable selection with highly efficient computation.