Out-of-Sample (OOS) Embedding¶
Here, an “adjacency vector” \(w\) is a vector with \(n\) elements, \(n\) being the number of in-sample vertices, and a 1 in the \(i_{th}\) position if the out-of-sample vertex has an edge with in-sample vertex \(i\) in the unweighted case.
\(W \in \textbf{R}^{m \times n}\) is a matrix with each row being an adjacency vector, for \(m\) out-of-sample vertices.
transform
method.[1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from numpy.random import normal, poisson
from graspologic.simulations import sbm
from graspologic.embed import AdjacencySpectralEmbed as ASE
from graspologic.plot import heatmap, pairplot
from graspologic.utils import remove_vertices
np.random.seed(9002)
import warnings
warnings.filterwarnings('ignore')
/opt/buildhome/python3.7/lib/python3.7/site-packages/umap/__init__.py:9: UserWarning: Tensorflow not installed; ParametricUMAP will be unavailable
warn("Tensorflow not installed; ParametricUMAP will be unavailable")
Undirected out-of-sample prediction¶
Here, we embed an undirected two-block stochastic block model with ASE. We then use its transform method to find an out-of-sample prediction for both a single vertex and multiple vertices.
We begin by generating data.
[2]:
# Generate parameters
nodes_per_community = 100
P = np.array([[0.8, 0.2],
[0.2, 0.8]])
# Generate an undirected Stochastic Block Model (SBM)
undirected, labels_ = sbm(2*[nodes_per_community], P, return_labels=True)
labels = list(labels_)
# Grab out-of-sample vertices
oos_idx = 0
oos_labels = labels.pop(oos_idx)
A, a = remove_vertices(undirected, indices=oos_idx, return_removed=True)
# plot our SBM
heatmap(A, title=f'2-block SBM (undirected), shape {A.shape}', inner_hier_labels=labels);
[2]:
<AxesSubplot:title={'center':'2-block SBM (undirected), shape (199, 199)'}>

Embedding¶
We then generate an embedding with ASE, and we use its transform
method to determine our best estimate for the latent position of the out-of-sample vertex.
[3]:
# Generate an embedding with ASE
ase = ASE(n_components=2)
X_hat = ase.fit_transform(A)
# predicted latent positions
w = ase.transform(a)
w
[3]:
array([0.71754581, 0.51499 ])
Plotting out-of-sample embedding¶
[4]:
def plot_oos(X_hat, oos_vertices, labels, oos_labels, title):
# Plot the in-sample latent positions
plot = pairplot(X_hat, labels=labels, title=title)
# generate an out-of-sample dataframe
oos_vertices = np.atleast_2d(oos_vertices)
data = {'Type': oos_labels,
'Dimension 1': oos_vertices[:, 0],
'Dimension 2': oos_vertices[:, 1]}
oos_df = pd.DataFrame(data=data)
# update plot with out-of-sample latent positions,
# plotting out-of-sample latent positions as stars
plot.data = oos_df
plot.hue_vals = oos_df["Type"]
plot.map_offdiag(sns.scatterplot, s=500,
marker="*", edgecolor="black")
plot.tight_layout()
return plot
# Plot all latent positions
plot_oos(X_hat, w, labels=labels, oos_labels=[0], title="Out-of-Sample Embeddings (2-block SBM)");
[4]:
<seaborn.axisgrid.PairGrid at 0x7f29cd831290>

Passing in multiple out-of-sample vertices¶
You can pass a 2d numpy array into transform
. The rows are the out-of-sample vertices, and the columns are their edges to the in-sample vertices.
[5]:
# Grab out-of-sample vertices
labels = list(labels_)
oos_idx = [0, -1]
oos_labels = [labels.pop(i) for i in oos_idx]
A, a = remove_vertices(undirected, indices=oos_idx, return_removed=True)
# our out-of-sample array is m x n
print(f"a is {type(a)} with shape {a.shape}")
a is <class 'numpy.ndarray'> with shape (2, 198)
[6]:
# Generate an embedding with ASE
ase = ASE(n_components=2)
X_hat = ase.fit_transform(A)
# predicted latent positions
w = ase.transform(a)
print(f"The out-of-sample prediction output has dimensions {w.shape}\n")
# Plot all latent positions
plot_oos(X_hat, w, labels, oos_labels=oos_labels,
title="Out-of-Sample Embeddings (2-block SBM)");
The out-of-sample prediction output has dimensions (2, 2)
[6]:
<seaborn.axisgrid.PairGrid at 0x7f293f886bd0>

Directed out-of-sample prediction¶
Not all graphs are undirected. When finding out-of-sample latent positions for directed graphs, \(A \in \textbf{R}^{n \times n}\) is not symmetric. \(A_{i,j}\) represents the edge from node \(i\) to node \(j\), whereas \(A_{j, i}\) represents the edge from node \(j\) to node \(i\).
transform
method. It then outputs a tuple of (out_latent_prediction, in_latent_prediction).[7]:
# Generate a directed SBM
directed = sbm(2*[nodes_per_community], P, directed=True)
oos_idx = [0, -1]
# a is a tuple of (out_oos, in_oos)
A, a = remove_vertices(directed, indices=oos_idx, return_removed=True)
# Plot the new adjacency matrix
heatmap(directed, title=f'2-block SBM (directed), shape {A.shape}');
[7]:
<AxesSubplot:title={'center':'2-block SBM (directed), shape (198, 198)'}>

[8]:
# Fit our directed graph
X_hat, Y_hat = ase.fit_transform(A)
# predicted latent positions
w = ase.transform(a)
print(f"output of `ase.transform(a)` is {type(w)}", "\n")
print(f"out latent positions: \n{w[0]}\n")
print(f"in latent positions: \n{w[1]}")
output of `ase.transform(a)` is <class 'tuple'>
out latent positions:
[[ 0.67377785 0.56976739]
[ 0.64340223 -0.42217671]]
in latent positions:
[[ 0.73600352 0.54927494]
[ 0.68732202 -0.52651664]]
Plotting directed latent predictions¶
[9]:
plot_oos(X_hat, w[0], labels, oos_labels=oos_labels, title="Out Latent Predictions")
plot_oos(Y_hat, w[1], labels, oos_labels=oos_labels, title="In Latent Predictions")
[9]:
<seaborn.axisgrid.PairGrid at 0x7f293fe22e90>


Weighted out-of-sample prediction¶
Weighted graphs work as well. Here, we generate a directed, weighted graph and estimate the latent positions for multiple out-of-sample vertices.
[10]:
# Generate a weighted, directed SBM
wt = [[normal, poisson],
[poisson, normal]]
wtargs = [[dict(loc=3, scale=1), dict(lam=5)],
[dict(lam=5), dict(loc=3, scale=1)]]
weighted = sbm(2*[nodes_per_community], P, wt=wt, wtargs=wtargs, directed=True)
# Generate out-of-sample vertices
oos_idx = [0, -1]
A, a = remove_vertices(weighted, indices=oos_idx, return_removed=True)
# Plot our weighted, directed SBM
heatmap(A, title=f'2-block SBM (directed, weighted), shape {A.shape}')
[10]:
<AxesSubplot:title={'center':'2-block SBM (directed, weighted), shape (198, 198)'}>

[11]:
# Embed and transform
X_hat, Y_hat = ase.fit_transform(A)
w = ase.transform(a)
# Plot
plot_oos(X_hat, w[0], labels, oos_labels=oos_labels, title="Out Latent Predictions")
plot_oos(Y_hat, w[1],labels, oos_labels=oos_labels, title="In Latent Predictions")
[11]:
<seaborn.axisgrid.PairGrid at 0x7f2934f0c490>

