Inference

Two-graph hypothesis testing

class graspy.inference.SemiparametricTest(embedding='ase', n_components=None, n_bootstraps=500, test_case='rotation')[source]

Two sample hypothesis test for the semiparametric problem of determining whether two random dot product graphs have the same latent positions [R1f3edd5bcc7e-1].

Currently, the function only supports undirected graphs

Parameters:
embedding : string, { 'ase' (default), 'omnibus'}

String describing the embedding method to use:

  • 'ase'
    Embed each graph separately using adjacency spectral embedding and use Procrustes to align the embeddings.
  • 'omnibus'
    Embed all graphs simultaneously using omnibus embedding.
n_components : None (default), or int

Number of embedding dimensions. If None, the optimal embedding dimensions are found by the Zhu and Godsi algorithm.

test_case : string, {'rotation' (default), 'scalar-rotation', 'diagonal-rotation'}

describes the exact form of the hypothesis to test when using 'ase' or 'lse' as an embedding method. Ignored if using 'omnibus'. Given two latent positions, \(X_1\) and \(X_2\), and an orthogonal rotation matrix \(R\) that minimizes \(||X_1 - X_2 R||_F\):

  • 'rotation'
    \[H_o: X_1 = X_2 R\]
  • 'scalar-rotation'
    \[H_o: X_1 = c X_2 R\]

    where c is a scalar, c > 0

  • 'diagonal-rotation'
    \[H_o: X_1 = D X_2 R\]

    where D is an arbitrary diagonal matrix

n_bootstraps : int, optional (default 500)

Number of bootstrap simulations to run to generate the null distribution

Attributes:
null_distribution_1_, null_distribution_2_ : np.ndarray (n_bootstraps,)

The distribution of T statistics generated under the null, using the first and and second input graph, respectively. The latent positions of each sample graph are used independently to sample random dot product graphs, so two null distributions are generated

sample_T_statistic_ : float

The observed difference between the embedded positions of the two input graphs after an alignment (the type of alignment depends on test_case)

p_value_1_, p_value_2_ : float

The p value estimated from the null distributions from sample 1 and sample 2.

p_ : float

The overall p value from the semiparametric test; this is the max of p_value_1_ and p_value_2_

References

[R1f3edd5bcc7e-1]Tang, M., A. Athreya, D. Sussman, V. Lyzinski, Y. Park, Priebe, C.E. "A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs" Journal of Computational and Graphical Statistics, Vol. 26(2), 2017

Examples

>>> spt = SemiparametricTest(n_components=2, test_case='rotation')
>>> p = spt.fit(A1, A2)
fit(self, A1, A2)[source]

Fits the test to the two input graphs

Parameters:
A1, A2 : nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray

The two graphs to run a hypothesis test on. If np.ndarray, shape must be (n_vertices, n_vertices) for both graphs, where n_vertices is the same for both

Returns:
p : float

The p value corresponding to the specified hypothesis test

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns:
self
class graspy.inference.NonparametricTest(n_components=None, n_bootstraps=200, bandwidth=None)[source]

Two sample hypothesis test for the nonparamatric problem of determining whether two random dot product graphs have the same latent positions [Rf7b962b90d24-2].

Currently, testing is only supported for undirected graphs.

Parameters:
n_components : int or None, optional (default=None)

Number of embedding dimensions. If None, the optimal embedding dimensions are found by the Zhu and Godsi algorithm.

n_bootstraps : int (default=200)

Number of bootstrap iterations.

bandwidth : float, optional (default=0.5)

Bandwidth to use for gaussian kernel. If None, the median heuristic will be used.

Attributes:
sample_T_statistic_ : float

The observed difference between the embedded latent positions of the two input graphs.

p_ : float

The overall p value from the nonparametric test.

null_distribution_ : ndarray, shape (n_bootstraps, )

The distribution of T statistics generated under the null.

References

[Rf7b962b90d24-2]Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., & Priebe, C. E. (2017). "A nonparametric two-sample hypothesis testing problem for random graphs." Bernoulli, 23(3), 1599-1630.
fit(self, A1, A2)[source]

Fits the test to the two input graphs

Parameters:
A1, A2 : nx.Graph, nx.DiGraph, nx.MultiDiGraph, nx.MultiGraph, np.ndarray

The two graphs to run a hypothesis test on.

Returns:
p_ : float

The p value corresponding to the specified hypothesis test

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns:
self