# Latent Distribution Two-Graph Testing¶

[1]:

import numpy as np
import matplotlib.pyplot as plt
np.random.seed(8888)

from graspy.inference import LatentDistributionTest
from graspy.simulations import sbm, rdpg
from graspy.utils import symmetrize
from graspy.plot import heatmap, pairplot

%matplotlib inline


## Generate a stochastic block model graph¶

We generate a stochastic block model graph (SBM), which is shown below.

[2]:

n_components = 4 # the number of embedding dimensions for ASE
P = np.array([[0.9, 0.11, 0.13, 0.2],
[0, 0.7, 0.1, 0.1],
[0, 0, 0.8, 0.1],
[0, 0, 0, 0.85]])

P = symmetrize(P)
csize = [50] * 4
A = sbm(csize, P)

[2]:

<matplotlib.axes._subplots.AxesSubplot at 0x128644518>

[2]:

<seaborn.axisgrid.PairGrid at 0x12881a048>


## Latent distribution test where null is true¶

Now, we want to know whether the above two graphs were generated from the same latent position. We know that they were, so the test should predict that the differences between SBM 1 and 2 (up to a rotation) are no greater than those differences observed by chance.

In other words, we are testing

\begin{align*} H_0:&X_1 = X_2\\ H_\alpha:& X_1 \neq X_2 \end{align*}

and want to see that the p-value for the unmatched test is high (fail to reject the null)

We generate a second SBM in the same way, and run an unmatched test on it, generating a distance between the two graphs as well as a null distribution of distances between permutations of the graph. We can see this below.

[3]:

A1 = sbm(csize, P)
heatmap(A1, title='4-block SBM adjacency matrix A1')
pairplot(X1, title='4-block adjacency spectral embedding A1')

[3]:

<matplotlib.axes._subplots.AxesSubplot at 0x12d1420f0>

[3]:

<seaborn.axisgrid.PairGrid at 0x12d52da90>


## Plot of Null Distribution¶

We plot the null distribution shown in blue and the test statistic shown red vertical line. We see that the test static is small, resulting in p-value of 0.94. Thus, we cannot reject the null hypothesis that the two graphs come from the same generating distributions.

[4]:

ldt = LatentDistributionTest()
p = ldt.fit(A, A1)

fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(ldt.null_distribution_, 50)
ax.axvline(ldt.sample_T_statistic_, color='r')
ax.set_title("P-value = {}".format(p), fontsize=20)
plt.show();


## Latent distribution test where null is false¶

We generate a seconds SBM with different block probabilities, and run a latent distribution test comaring the previous graph with the new one.

[5]:

P2 = np.array([[0.8, 0.2, 0.2, 0.5],
[0, 0.9, 0.3, 0.2],
[0, 0, 0.5, 0.2],
[0, 0, 0, 0.5]])

P2 = symmetrize(P2)
A2 = sbm(csize, P2)
heatmap(A2, title='4-block SBM adjacency matrix A2')
pairplot(X2, title='4-block adjacency spectral embedding A2')

[5]:

<matplotlib.axes._subplots.AxesSubplot at 0x1286af128>

[5]:

<seaborn.axisgrid.PairGrid at 0x12e1eaf98>


## Plot of Null Distribution¶

We plot the null distribution shown in blue and the test statistic shown red vertical line. We see that the test static is small, resulting in p-value of 0. Thus, we reject the null hypothesis that the two graphs come from the same generating distributions.

[6]:

ldt = LatentDistributionTest()
p = ldt.fit(A, A2)

fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(ldt.null_distribution_, 50)
ax.axvline(ldt.sample_T_statistic_, color='r')
ax.set_title("P-value = {}".format(p), fontsize=20)
plt.show();