Statistics Seminar

Karl RoheUniversity of Wisconsin-Madison
Making Spectral Graph Theory work in practice. Making the practice work in theory.

Wednesday, November 14, 2018 - 4:15pm
Biotech G01

Abstract:After introducing Cheeger's Inequality and spectral clustering, this talk has two parts. The first part will (1) show how spectral clustering gives "bad results" in many applied settings and (2) illustrate a "hack" that makes it work very well. People call the hack "regularized spectral clustering,” but it is not clear why it has regularization properties. The talk will provide a simple theory that provides a deeper understanding of where the bad results come from and why the hack works so well. This theory explains the regularization by showing that spectral clustering overfits to graph conductance.

There are four pieces to this simple theory. First, sparse and stochastic graphs create a lot of small trees that are connected to the core of the graph by only one edge. Second, graph conductance is sensitive to these noisy "dangling sets." Third, by Cheeger's inequality and an inequality from Ky Fan, spectral clustering inherits this sensitivity. These three pieces explain why spectral clustering gives bad results in practice. The fourth piece uses Cheeger's inequality to show how the hack creates a new form of graph conductance that we call CoreCut. Simple inspection of CoreCut reveals why it is less sensitive to small cuts in the graph. In addition to this statistical benefit, these results also demonstrate why the regularization also improves the computational speed of spectral clustering. This is joint work with Yilin Zhang. https://arxiv.org/abs/1806.01468. Some ideas will be illustrated with data from murmuration.wisc.edu/