Scalable community detection in massive networks via predictive inference
Massive networks are becoming increasingly common in applications such as social media, neuroscience, epidemiology, and healthcare analytics. Existing community detection methods are infeasible for such large-scale networks for two reasons. First, the full network must be stored and processed in a single server, resulting in prohibitively high memory costs. Second, existing methods typically use matrix factorization or iterative optimization, leading to high runtimes. We propose a strategy called predictive inference to enable computationally efficient community detection while also ensuring statistical accuracy. The core idea is to avoid large-scale matrix computations by splitting the task into two steps, one smaller matrix computation plus a large number of vector computations that can be performed in parallel. In the first step, community detection is carried out on a small subgraph to estimate the community membership of subgraph nodes and model parameters. In the second step, each remaining node is assigned to a community by using these estimated quantities. We study the theoretical and empirical performance of predictive inference for spectral clustering and bias-adjusted spectral clustering under the stochastic blockmodel and its degree-corrected version. This is joint work with Subhankar Bhadra (North Carolina State University) and Marianna Penskty (University of Central Florida).