Adjacency Spectral Embed

This demos shows how to use the Adjacency Spectral Embed (ASE) class. We then use ASE to show how two communities from a stochastic block model graph can be found in the embedded space using k-means.

[1]:
import numpy as np
np.random.seed(8889)
from sklearn.metrics import adjusted_rand_score
from sklearn.cluster import KMeans
from graspy.embed import AdjacencySpectralEmbed
from graspy.simulations import sbm
from graspy.plot import heatmap, pairplot

Using ASE on simple stochastic block model (SBM) graphs

ASE is a method for estimating the latent positions of a network modeled as a Random Dot Product Graph (RDPG). This embedding is both a form of dimensionality reduction for a graph, and a way of fitting a generative model to graph data.

[2]:
n_verts = 100
labels_sbm = n_verts * [0] + n_verts * [1]
sizes = 2*[n_verts]
P = np.array([[0.8, 0.2], [0.2, 0.8]])
undirected_sbm = sbm(sizes, P)
heatmap(undirected_sbm, title='2-block SBM (undirected)')
directed_sbm = sbm(sizes, P, directed=True)
heatmap(directed_sbm, title='2-block SBM (directed)')
[2]:
<matplotlib.axes._subplots.AxesSubplot at 0x169e423e390>
../../_images/tutorials_embedding_AdjacencySpectralEmbed_3_1.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_3_2.png

We can use the AdjacencySpectralEmbed class to embed the adjacency matrix as follows. If no parameters are given to the AdjacencySpectralEmbed class, it will automatically choose the number of dimensions to embed into

[3]:
ase = AdjacencySpectralEmbed()
Xhat = ase.fit_transform(undirected_sbm)
pairplot(Xhat, title='SBM adjacency spectral embedding')
[3]:
<seaborn.axisgrid.PairGrid at 0x169e45f6630>
../../_images/tutorials_embedding_AdjacencySpectralEmbed_5_1.png

If the graph is directed, we will get two outputs roughly corresponding to the “out” and “in” latent positions, since these are no longer the same.

[4]:
ase = AdjacencySpectralEmbed()
Xhat, Yhat = ase.fit_transform(directed_sbm)
pairplot(Xhat, title='SBM adjacency spectral embedding "out"')
pairplot(Yhat, title='SBM adjacency spectral embedding "in"')
[4]:
<seaborn.axisgrid.PairGrid at 0x169e4ae4278>
../../_images/tutorials_embedding_AdjacencySpectralEmbed_7_1.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_7_2.png

One can also specify the parameters for embedding, for example, by manually inputting the number of embedded dimensions and changing the SVD solver used to compute the embedding

[5]:
ase = AdjacencySpectralEmbed(n_components=2, algorithm='truncated')
Xhat = ase.fit_transform(undirected_sbm)
pairplot(Xhat, title='2-component embedding', height=4)
[5]:
<seaborn.axisgrid.PairGrid at 0x169e92bf438>
../../_images/tutorials_embedding_AdjacencySpectralEmbed_9_1.png

Clustering in the embedded space

Now, we will use the Euclidian representation of the graph to apply a standard clustering algorithm like k-means. We start with an SBM model where the 2 blocks have the exact same connection probabilities (effectively giving us an ER model graph). In this case, k-means will not be able to distinguish among the two embedded blocks. As the connections between the blocks become more distinct, the clustering will improve. For each graph, we plot its adjacency matrix, the predicted k-means cluster labels in the embedded space, and the error as a function of the true labels. Adjusted Rand Index (ARI) is a measure of clustering accuracy, where 1 is perfect clustering relative to ground truth. Error rate is simply the proportion of correctly labeled nodes.

[6]:
palette = {'Right':(0,0.7,0.2),
           'Wrong':(0.8,0.1,0.1)}
for insularity in np.linspace(0.5, 0.625, 5):
    P = np.array([[insularity, 1-insularity], [1-insularity, insularity]])
    sampled_sbm = sbm(sizes, P)
    heatmap(sampled_sbm, title='Insularity: {}'.format(str(insularity)[:5]))
    Xhat = AdjacencySpectralEmbed(n_components=2).fit_transform(sampled_sbm)
    labels_kmeans = KMeans(n_clusters=2).fit_predict(Xhat)
    ari = adjusted_rand_score(labels_sbm, labels_kmeans)
    error = labels_sbm - labels_kmeans
    error = error != 0
    # sometimes the labels given by kmeans will be the inverse of ours
    if np.sum(error) / (2 * n_verts) > 0.5:
        error = error == 0
    error_rate = np.sum(error) / (2 * n_verts)
    error_label = (2 * n_verts) * ['Right']
    error_label = np.array(error_label)
    error_label[error] = 'Wrong'

    pairplot(Xhat,
             Y=labels_kmeans,
             title='KMeans on embedding, ARI: {}'.format(str(ari)[:5]),
             legend_name='Predicted label',
             height=3.5,
             palette='muted')
    pairplot(Xhat,
             Y=error_label,
             title='Error from KMeans, Error rate: {}'.format(str(error_rate)),
             legend_name='Error label',
             height=3.5,
             palette=palette)
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_0.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_1.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_2.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_3.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_4.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_5.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_6.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_7.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_8.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_9.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_10.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_11.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_12.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_13.png
../../_images/tutorials_embedding_AdjacencySpectralEmbed_11_14.png