Inference¶
Twograph hypothesis testing¶

class
graspologic.inference.
LatentPositionTest
(embedding='ase', n_components=None, n_bootstraps=500, test_case='rotation')[source]¶ Twosample hypothesis test for the problem of determining whether two random dot product graphs have the same latent positions.
This test assumes that the two input graphs are vertex aligned, that is, there is a known mapping between vertices in the two graphs and the input graphs have their vertices sorted in the same order. Currently, the function only supports undirected graphs.
Read more in the tutorials
Parameters: embedding : string, { 'ase' (default), 'omnibus'}
String describing the embedding method to use:
 'ase'
 Embed each graph separately using adjacency spectral embedding and use Procrustes to align the embeddings.
 'omnibus'
 Embed all graphs simultaneously using omnibus embedding.
n_components : None (default), or int
Number of embedding dimensions. If None, the optimal embedding dimensions are found by the Zhu and Godsi algorithm.
test_case : string, {'rotation' (default), 'scalarrotation', 'diagonalrotation'}
describes the exact form of the hypothesis to test when using 'ase' or 'lse' as an embedding method. Ignored if using 'omnibus'. Given two latent positions, \(X_1\) and \(X_2\), and an orthogonal rotation matrix \(R\) that minimizes \(X_1  X_2 R_F\):
 'rotation'
 \[H_o: X_1 = X_2 R\]
 'scalarrotation'
 \[H_o: X_1 = c X_2 R\]
where \(c\) is a scalar, \(c > 0\)
 'diagonalrotation'
 \[H_o: X_1 = D X_2 R\]
where \(D\) is an arbitrary diagonal matrix
n_bootstraps : int, optional (default 500)
Number of bootstrap simulations to run to generate the null distribution
Attributes: null_distribution_1_, null_distribution_2_ : np.ndarray (n_bootstraps,)
The distribution of T statistics generated under the null, using the first and and second input graph, respectively. The latent positions of each sample graph are used independently to sample random dot product graphs, so two null distributions are generated
sample_T_statistic_ : float
The observed difference between the embedded positions of the two input graphs after an alignment (the type of alignment depends on
test_case
)p_value_1_, p_value_2_ : float
The p value estimated from the null distributions from sample 1 and sample 2.
p_value_ : float
The overall p value from the test; this is the max of
p_value_1_
andp_value_2_
See also
References
[Rfb90779127b51] Tang, M., A. Athreya, D. Sussman, V. Lyzinski, Y. Park, Priebe, C.E. "A Semiparametric TwoSample Hypothesis Testing Problem for Random Graphs" Journal of Computational and Graphical Statistics, Vol. 26(2), 2017 
fit
(self, A1, A2)[source]¶ Fits the test to the two input graphs
Parameters: A1, A2 : nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray
The two graphs to run a hypothesis test on. If np.ndarray, shape must be
(n_vertices, n_vertices)
for both graphs, wheren_vertices
is the same for bothReturns:  self

fit_predict
(self, A1, A2)¶ Fits the model and returns the pvalue
Parameters: A1, A2 : nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray
The two graphs to run a hypothesis test on. If np.ndarray, shape must be
(n_vertices, n_vertices)
for both graphs, wheren_vertices
is the same for both Returns
 
p_value_ : float
The overall p value from the test

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters: deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Parameters: **params : dict
Estimator parameters.
Returns: self : object
Estimator instance.

class
graspologic.inference.
LatentDistributionTest
(test='dcorr', metric='euclidean', n_components=None, n_bootstraps=200, workers=1, size_correction=True, pooled=False, align_type='sign_flips', align_kws={}, input_graph=True)[source]¶ Twosample hypothesis test for the problem of determining whether two random dot product graphs have the same distributions of latent positions.
This test can operate on two graphs where there is no known matching between the vertices of the two graphs, or even when the number of vertices is different. Currently, testing is only supported for undirected graphs.
Read more in the tutorials
Parameters: test : str (default="hsic")
Backend hypothesis test to use, one of ["cca", "dcorr", "hhg", "rv", "hsic", "mgc"]. These tests are typically used for independence testing, but here they are used for a twosample hypothesis test on the latent positions of two graphs. See
hyppo.ksample.KSample
for more information.metric : str or function (default="gaussian")
Distance or a kernel metric to use, either a callable or a valid string. If a callable, then it should behave similarly to either
sklearn.metrics.pairwise_distances()
or tosklearn.metrics.pairwise.pairwise_kernels()
. If a string, then it should be either one of the keys in either sklearn.metrics.pairwise.PAIRED_DISTANCES or in sklearn.metrics.pairwise.PAIRWISE_KERNEL_FUNCTIONS, or "gaussian", which will use a gaussian kernel with an adaptively selected bandwidth. It is recommended to use kernels (e.g. "gaussian") with kernelbased hsic test and distances (e.g. "euclidean") with all other tests.n_components : int or None (default=None)
Number of embedding dimensions. If None, the optimal embedding dimensions are found by the Zhu and Godsi algorithm. See
selectSVD()
for more information. This argument is ignored if input_graph is False.n_bootstraps : int (default=200)
Number of bootstrap iterations for the backend hypothesis test. See
hyppo.ksample.KSample
for more information.workers : int (default=1)
Number of workers to use. If more than 1, parallelizes the code. Supply 1 to use all cores available to the Process.
size_correction : bool (default=True)
Ignored when the two graphs have the same number of vertices. The test degrades in validity as the number of vertices of the two graphs diverge from each other, unless a correction is performed.
 True
 Whenever the two graphs have different numbers of vertices, estimates the plugin estimator for the variance and uses it to correct the embedding of the larger graph.
 False
 Does not perform any modifications (not recommended).
pooled : bool (default=False)
Ignored whenever the two graphs have the same number of vertices or size_correction is set to False. In order to correct the adjacency spectral embedding used in the test, it is needed to estimate the variance for each of the latent position estimates in the larger graph, which requires to compute different sample moments. These moments can be computed either over the larger graph (False), or over both graphs (True). Setting it to True should not affect the behavior of the test under the null hypothesis, but it is not clear whether it has more power or less power under which alternatives. Generally not recomended, as it is untested and included for experimental purposes.
align_type : str, {'sign_flips' (default), 'seedless_procrustes'} or None
Random dot product graphs have an inherent nonidentifiability, associated with their latent positions. Thus, two embeddings of different graphs may not be orthogonally aligned. Without this accounted for, two embeddings of different graphs may appear different, even if the distributions of the true latent positions are the same. There are several options in terms of how this can be addresssed:
 'sign_flips'
 A simple heuristic that flips the signs of one of the embeddings,
if the medians of the two embeddings in that dimension differ from
each other. See
SignFlips
for more information on this procedure. In the limit, this is guaranteed to lead to a valid test, as long as matrix \(X^T X\), where \(X\) is the latent positions does not have repeated nonzero eigenvalues. This may, however, result in an invalid test in the finite sample case if the some eigenvalues are same or close.
 'seedless_procrustes'
 An algorithm that learns an orthogonal alignment matrix. This procedure is slower than sign flips, but is guaranteed to yield a valid test in the limit, and also makes the test more valid in some finite sample cases, in which the eigenvalues are very close to each other. See ~graspologic.align.SignFlips for more information on the procedure.
 None
 Do not use any alignment technique. This is strongly not recommended, as it may often result in a test that is not valid.
align_kws : dict
Keyword arguments for the aligner of choice, either ~graspologic.align.SignFlips or ~graspologic.align.SeedlessProcrustes, depending on the align_type. See respective classes for more information.
input_graph : bool (default=True)
Flag whether to expect two full graphs, or the embeddings.
 True
 .fit and .fit_predict() expect graphs, either as NetworkX graph objects or as adjacency matrices, provided as ndarrays of size (n, n) and (m, m). They will be embedded using adjacency spectral embeddings.
 False
 .fit() and .fit_predict() expect adjacency spectral embeddings of the graphs, they must be ndarrays of size (n, d) and (m, d), where d must be same. n_components attribute is ignored in this case.
Attributes: metric_func_ : callable
A callable associated with the specified metric. See metric.
null_distribution_ : ndarray, shape (n_bootstraps, )
The distribution of T statistics generated under the null.
sample_T_statistic_ : float
The observed difference between the embedded latent positions of the two input graphs.
p_value_ : float
The overall p value from the test.
References
[Rd29f163ce72a1] Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., & Priebe, C. E. (2017). "A nonparametric twosample hypothesis testing problem for random graphs." Bernoulli, 23(3), 15991630. [Rd29f163ce72a2] Panda, S., Palaniappan, S., Xiong, J., Bridgeford, E., Mehta, R., Shen, C., & Vogelstein, J. (2019). "hyppo: A Comprehensive Multivariate Hypothesis Testing Python Package." arXiv:1907.02088. [Rd29f163ce72a3] Alyakin, A. A., Agterberg, J., Helm, H. S., Priebe, C. E. (2020). "Correcting a Nonparametric Twosample Graph Hypothesis Test for Graphs with Different Numbers of Vertices" arXiv:2008.09434 
fit
(self, A1, A2)[source]¶ Fits the test to the two input graphs
Parameters: A1, A2 : variable (see description)
The two graphs, or their embeddings to run a hypothesis test on. Expected variable type and shape depends on input_graph attribute:
 input_graph=True
 expects two unembedded graphs either as NetworkX graph objects, or as two np.ndarrays, representing the adjacency matrices. In this case will be embedded using adjacency spectral embedding.
 input_graphFalse
 expects two already embedded graphs. In this case they must be arrays of shape (n, d) and (m, d), where d, the number of components, must be shared.
Note that regardless of how the graphs are passed, they need not have the same number of vertices.
Returns:  self

fit_predict
(self, A1, A2)[source]¶ Fits the test to the two input graphs and returns the pvalue
Parameters: A1, A2 : variable (see description)
The two graphs, or their embeddings to run a hypothesis test on. Expected variable type and shape depends on input_graph attribute:
 input_graph=True
 expects two unembedded graphs either as NetworkX graph objects, or as two np.ndarrays, representing the adjacency matrices. In this case will be embedded using adjacency spectral embedding.
 input_graphFalse
 expects two already embedded graphs. In this case they must be arrays of shape (n, d) and (m, d), where d, the number of components, must be shared.
Note that regardless of how the graphs are passed, they need not to have the same number of vertices.
Returns: p_value_ : float
The overall p value from the test

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters: deep : bool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns: params : mapping of string to any
Parameter names mapped to their values.

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Parameters: **params : dict
Estimator parameters.
Returns: self : object
Estimator instance.