- This event has passed.
PhD Defense: Dhruv Patel
30 Mar @ 8:00 am - 10:00 am
PhD Defense: Dhruv Patel
30 Mar @ 8:00 am – 10:00 amCommunity Dynamic Factor Models: Correlation Networks, Clustering Guarantees and Vector Autoregressions
Network-based techniques for high-dimensional time series data have gained popularity due to their success in various fields, such as neuroscience and econometrics. These methods treat component time series as nodes and assume an underlying network structure drives the dynamics of the time series. Although these methods provide key insights in practice, few theoretical justifications and guarantees exist.
To address this issue, we introduce the Community Dynamic Factor Model (CDFM), an extension of the Dynamic Factor Model (DFM), which provides theoretical foundations for network-based techniques for community detection in observed node level time series. The CDFM enforces a community structure in the DFM by assuming that the loading vectors are drawn from a mixture where each mixing distribution represents a different community. The correlation matrix of the CDFM inherits an approximate low-rank structure from the mixture which resembles commonly observed block-like structure of the correlation matrix in real-world applications.
Motivated by providing clustering guarantees for the CDFM, we prove a result on Lloyd’s algorithm, or naive k-means, for additively perturbed samples from a sub-Gaussian mixture distribution. Often times in practice samples from a mixture distribution are estimated rather than directly observed. When this estimation error can be bounded, we prove a misclustering rate bound which depends on the estimation error and a measure of signal-to-noise ratio of the mixture distribution.
Our work also establishes connections between the CDFM and other existing network modulated time series models, by proving that the DFM can generally be expressed as a Vector AutoRegressive Moving Average (VARMA) model. This allows us to leverage the well-established methodology for analyzing and forecasting time series data that are available for the VARMA model and apply them to the DFM, and hence the CDFM. Thus, we significantly enhance the utility of the CDFM as a tool for analyzing complex high-dimensional time series data.