STOR Colloquium: George Lan, Georgia Institute of Technology
Stochastic optimization methods for reinforcement learning
ABSTRACT: Reinforcement Learning (RL) has attracted considerable interest from both industry and academia recently. The study of RL algorithms with provable rates of convergence, however, is still in its infancy. In this talk, we discuss some recent progresses on the solutions of two fundamental RL problems, i.e., stochastic policy evaluation and policy optimization, based on our studies for stochastic optimization methods. More specifically, we develop novel analysis of temporal difference (TD) learning and present new conditional TD algorithm (CTD) and fast TD (FTD) algorithm to achieve the best-known so-far convergence rate for policy evaluation. For policy optimization, we introduce a new class of policy mirror descent (PMD) methods and show that they achieve linear convergence for the deterministic case and optimal sampling complexity for the stochastic case, regardless whether the RL problem is regularized or not.
BIO: Guanghui (George) Lan is an A. Russell Chandler III professor in the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Institute of Technology. Dr. Lan was on the faculty of the Department of Industrial and Systems Engineering at the University of Florida from 2009 to 2015, after earning his Ph.D. degree from Georgia Institute of Technology in August 2009. His main research interests lie in optimization and machine learning. The academic honors he received include the Mathematical Optimization Society Tucker Prize Finalist (2012), INFORMS Junior Faculty Interest Group Paper Competition First Place (2012) and the National Science Foundation CAREER Award (2013). Dr. Lan serves as an associate editor for Mathematical Programming, SIAM Journal on Optimization and Computational Optimization and Applications. He is also an associate director of the Center for Machine Learning at Georgia Tech.