# Inference¶

## Two-graph hypothesis testing¶

class graspy.inference.LatentPositionTest(embedding='ase', n_components=None, n_bootstraps=500, test_case='rotation')[source]

Two-sample hypothesis test for the problem of determining whether two random dot product graphs have the same latent positions.

This test assumes that the two input graphs are vertex aligned, that is, there is a known mapping between vertices in the two graphs and the input graphs have their vertices sorted in the same order. Currently, the function only supports undirected graphs.

Read more in the tutorials

Parameters: embedding : string, { 'ase' (default), 'omnibus'} String describing the embedding method to use: 'ase' Embed each graph separately using adjacency spectral embedding and use Procrustes to align the embeddings. 'omnibus' Embed all graphs simultaneously using omnibus embedding. n_components : None (default), or int Number of embedding dimensions. If None, the optimal embedding dimensions are found by the Zhu and Godsi algorithm. test_case : string, {'rotation' (default), 'scalar-rotation', 'diagonal-rotation'} describes the exact form of the hypothesis to test when using 'ase' or 'lse' as an embedding method. Ignored if using 'omnibus'. Given two latent positions, $$X_1$$ and $$X_2$$, and an orthogonal rotation matrix $$R$$ that minimizes $$||X_1 - X_2 R||_F$$: 'rotation' $H_o: X_1 = X_2 R$ 'scalar-rotation' $H_o: X_1 = c X_2 R$ where $$c$$ is a scalar, $$c > 0$$ 'diagonal-rotation' $H_o: X_1 = D X_2 R$ where $$D$$ is an arbitrary diagonal matrix n_bootstraps : int, optional (default 500) Number of bootstrap simulations to run to generate the null distribution null_distribution_1_, null_distribution_2_ : np.ndarray (n_bootstraps,) The distribution of T statistics generated under the null, using the first and and second input graph, respectively. The latent positions of each sample graph are used independently to sample random dot product graphs, so two null distributions are generated sample_T_statistic_ : float The observed difference between the embedded positions of the two input graphs after an alignment (the type of alignment depends on test_case) p_value_1_, p_value_2_ : float The p value estimated from the null distributions from sample 1 and sample 2. p_ : float The overall p value from the test; this is the max of p_value_1_ and p_value_2_

References

 [R3d27477db6c1-1] Tang, M., A. Athreya, D. Sussman, V. Lyzinski, Y. Park, Priebe, C.E. "A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs" Journal of Computational and Graphical Statistics, Vol. 26(2), 2017
fit(self, A1, A2)[source]

Fits the test to the two input graphs

Parameters: A1, A2 : nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray The two graphs to run a hypothesis test on. If np.ndarray, shape must be (n_vertices, n_vertices) for both graphs, where n_vertices is the same for both p : float The p value corresponding to the specified hypothesis test
get_params(self, deep=True)

Get parameters for this estimator.

Parameters: deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. params : mapping of string to any Parameter names mapped to their values.
set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters: **params : dict Estimator parameters. self : object Estimator instance.
class graspy.inference.LatentDistributionTest(test='dcorr', metric='euclidean', n_components=None, n_bootstraps=200, workers=1)[source]

Two-sample hypothesis test for the problem of determining whether two random dot product graphs have the same distributions of latent positions.

This test can operate on two graphs where there is no known matching between the vertices of the two graphs. Currently, testing is only supported for undirected graphs.

Read more in the tutorials

Parameters: test : str Backend hypothesis test to use, one of ["cca", "dcorr", "hhg", "rv", "hsic", "mgc"]. These tests are typically used for independence testing, but here they are used for a two-sample hypothesis test on the latent positions of two graphs. See hyppo.ksample.KSample for more information. metric : str or function, (default="euclidean") Distance metric to use, either a callable or a valid string. The callable should behave similarly to sklearn.metrics.pairwise_distances(), if a string should be one of the keys in sklearn.metrics.pairwise.PAIRED_DISTANCES or "gaussian" which will use a Gaussian kernel on Euclidean distances with an adaptively selected bandwidth. n_components : int or None, optional (default=None) Number of embedding dimensions. If None, the optimal embedding dimensions are found by the Zhu and Godsi algorithm. See selectSVD() for more information. n_bootstraps : int (default=200) Number of bootstrap iterations for the backend hypothesis test. See hyppo.ksample.KSample for more information. workers : int, optional (default=1) Number of workers to use. If more than 1, parallelizes the code. sample_T_statistic_ : float The observed difference between the embedded latent positions of the two input graphs. p_value_ : float The overall p value from the test. null_distribution_ : ndarray, shape (n_bootstraps, ) The distribution of T statistics generated under the null.

References

 [R63152004fa12-1] Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., & Priebe, C. E. (2017). "A nonparametric two-sample hypothesis testing problem for random graphs." Bernoulli, 23(3), 1599-1630.
 [R63152004fa12-2] Panda, S., Palaniappan, S., Xiong, J., Bridgeford, E., Mehta, R., Shen, C., & Vogelstein, J. (2019). "hyppo: A Comprehensive Multivariate Hypothesis Testing Python Package." arXiv:1907.02088.
 [R63152004fa12-3] Varjavand, B., Arroyo, J., Tang, M., Priebe, C., and Vogelstein, J. (2019). "Improving Power of 2-Sample Random Graph Tests with Applications in Connectomics" arXiv:1911.02741
fit(self, A1, A2)[source]

Fits the test to the two input graphs

Parameters: A1, A2 : nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray The two graphs to run a hypothesis test on. p_value : float The p value corresponding to the specified hypothesis test
get_params(self, deep=True)

Get parameters for this estimator.

Parameters: deep : bool, default=True If True, will return the parameters for this estimator and contained subobjects that are estimators. params : mapping of string to any Parameter names mapped to their values.
set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters: **params : dict Estimator parameters. self : object Estimator instance.