Graduate Seminar: Kelly Bodwin and John Palowitch
Kelly Bodwin and John Palowitch
Object-Set Testing Methods for Association Mining
The exploratory mining of hidden, modular subgroups in complex systems is an ever-important task in our increasingly interconnected world. Go-to clustering approaches still involve classical methods like k-means, aggolmerative pairing, and latent-space modeling. These methods usually operate via an objective score of a partition of objects in the system under study: the higher the score, the stronger the “modularity” of the partition. In this talk, we present an alternative framework called Object-Set Testing (OST) whereby significant sets (clusters) are discovered through their associations with individual objects. The central component of OST is a set-updating algorithm which uses hypothesis tests to find and filter associated objects. The algorithm is quite general, and allows the use of intuitive notions of association in almost any data setting. Furthermore, the flexible, set-by-set nature of the algorithm allows for cluster overlap, un-clustered objects, and a fully adaptive number of clusters. We introduce three new OST approaches to diverse data settings, each with their own practical and analytical challenges, and present relevant real-data applications and theoretical results.