Hypothesis testing and clustering of large networks
In hypothesis testing for networks, we study the problem of testing whether two large graphs are generated from the same model. This problem lies behind the question of whether brain network of Alzheimer patients differ from those of healthy individuals. On the applied end, we proposed new two-sample tests to compare graphs / graph populations. Our theory resolved a fundamental question – when can we test large networks without access to multiple independent samples? We used statistical minimax theory to show that solvability of a testing problem with limited data depends crucially on the hypotheses under consideration, and some classical approaches to testing can fail in this setup.
Clustering can be viewed as a further complication of the testing problem, where given a graph population, the goal is to split the graphs into 2 or more statistically coherent groups. The problem is particularly challenging if the graphs are defined on different vertex sets. To address the issue of non-alignment of vertices, we consider clustering procedures based on graphons, which are infinite limits of large graphs. Graphon-based graph clustering algorithms come with both statistical guarantees as well as superior empirical performance over a range of approaches, such as graph kernels, network statistics as well as graph neural tangent kernels.
Funding
- Baden-Württemberg Foundation through BW Eliteprogramm for postdocs project "Clustering large evolving networks"
- Research on testing previously funded through DFG Research Unit FOR1735
Videos
Publications
- D. Ghoshdastidar, M. Gutzeit, A. Carpentier, U. Von Luxburg. Two-sample hypothesis testing for inhomogeneous random graphs. The Annals of Statistics, 48 (4): 2208-2229, 2020. [Paper] [Preprint]
- D. Ghoshdastidar, U. Von Luxburg. Practical methods for graph two-sample testing. NeurIPS, 2018. [Paper] [Preprint]
- D. Ghoshdastidar, M. Gutzeit, A. Carpentier, U. von Luxburg. Two-sample tests for large random graphs using network statistics. COLT, 2017. [Preprint]