Models

Erdos-Reyni models

class graspy.models.EREstimator(directed=True, loops=False)[source]

Erdos-Reyni Model

The Erdos-Reyni (ER) model is a simple random graph model in which the probability of any potential edge in the graph existing is the same for any two nodes \(i\) and \(j\).

\(P_{ij} = p\) for all i, j

Read more in the tutorials

Parameters:
directed : boolean, optional (default=True)

Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.

loops : boolean, optional (default=False)

Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.

Attributes:
p_ : float

Value between 0 and 1 (inclusive) representing the probability of any edge in the ER graph model

p_mat_ : np.ndarray, shape (n_verts, n_verts)

Probability matrix \(P\) for the fit model, from which graphs could be sampled.

References

[Rce4f95a7a410-1]https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model
fit(self, graph, y=None)[source]

Fit the SBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters:
graph : array_like or networkx.Graph

Input graph to fit

y : array_like, length graph.shape[0], optional

Categorical labels for the block assignments of the graph

bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
bic : float

The lower the better

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
mse : float

Mean square error for the model's fit P matrix

sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:
n_samples : int (default 1), optional

The number of graphs to sample

Returns:
graphs : np.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

Returns:
score : float

sum of log-loglikelihoods for each potential edge in input graph

score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

clip : scalar or None, optional (default=None)

Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:
sample_scores : np.ndarray (size of graph)

log-likelihood per potential edge in the graph

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns:
self
class graspy.models.DCEREstimator(directed=True, loops=False, degree_directed=False)[source]

Degree-corrected Erdos-Reyni Model

The Degree-corrected Erdos-Reyni (DCER) model is an extension of the ER model in which each node has an additional "promiscuity" parameter \(\theta_i\) that determines its expected degree in the graph.

\(P_{ij} = \theta_i \theta_j p\)

Read more in the tutorials

Parameters:
directed : boolean, optional (default=True)

Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.

loops : boolean, optional (default=False)

Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.

degree_directed : boolean

Whether to allow seperate degree correction parameters for the in and out degree of each node. Ignored if directed is False.

Attributes:
p_ : float

The \(p\) parameter as described in the above model, which weights the overall probability of connections between any two nodes.

p_mat_ : np.ndarray, shape (n_verts, n_verts)

Probability matrix \(P\) for the fit model, from which graphs could be sampled.

degree_corrections_ : np.ndarray, shape (n_verts, 1) or (n_verts, 2)

Degree correction vector(s) \(\theta\). If degree_directed parameter was False, then will be of shape (n_verts, 1) and element i represents the degree correction for node \(i\). Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.

Notes

The DCER model is rarely mentioned in literature, though it is simply a special case of the DCSBM where there is only one community.

References

[R52fcc65416a4-1]https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model
[R52fcc65416a4-2]Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.
fit(self, graph, y=None)[source]

Fit the DCSBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters:
graph : array_like or networkx.Graph

Input graph to fit

y : array_like, length graph.shape[0], optional

Categorical labels for the block assignments of the graph

Returns:
self : DCSBMEstimator object

Fitted instance of self

bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
bic : float

The lower the better

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
mse : float

Mean square error for the model's fit P matrix

sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:
n_samples : int (default 1), optional

The number of graphs to sample

Returns:
graphs : np.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

Returns:
score : float

sum of log-loglikelihoods for each potential edge in input graph

score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

clip : scalar or None, optional (default=None)

Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:
sample_scores : np.ndarray (size of graph)

log-likelihood per potential edge in the graph

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns:
self

Stochastic block models

class graspy.models.SBMEstimator(directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]

Stochastic Block Model

The stochastic block model (SBM) represents each node as belonging to a block (or community). For a given potential edge between node \(i\) and \(j\), the probability of an edge existing is specified by the block that nodes \(i\) and \(j\) belong to:

\(P_{ij} = B_{\tau_i \tau_j}\)

where \(B \in \mathbb{[0, 1]}^{K x K}\) and \(\tau\) is an n_nodes length vector specifying which block each node belongs to.

Read more in the tutorials

Parameters:
directed : boolean, optional (default=True)

Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.

loops : boolean, optional (default=False)

Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.

n_components : int, optional (default=None)

Desired dimensionality of embedding for clustering to find communities. n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension`().

min_comm : int, optional (default=1)

The minimum number of communities (blocks) to consider.

max_comm : int, optional (default=10)

The maximum number of communities (blocks) to consider (inclusive).

cluster_kws : dict, optional (default={})

Additional kwargs passed down to GaussianCluster

embed_kws : dict, optional (default={})

Additional kwargs passed down to AdjacencySpectralEmbed

Attributes:
block_p_ : np.ndarray, shape (n_blocks, n_blocks)

The block probability matrix \(B\), where the element \(B_{i, j}\) represents the probability of an edge between block \(i\) and block \(j\).

p_mat_ : np.ndarray, shape (n_verts, n_verts)

Probability matrix \(P\) for the fit model, from which graphs could be sampled.

vertex_assignments_ : np.ndarray, shape (n_verts)

A vector of integer labels corresponding to the predicted block that each node belongs to if y was not passed during the call to fit.

block_weights_ : np.ndarray, shape (n_blocks)

Contains the proportion of nodes that belong to each block in the fit model.

References

[R09567a76fb04-1]Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks, 5(2), 109-137.
fit(self, graph, y=None)[source]

Fit the SBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters:
graph : array_like or networkx.Graph

Input graph to fit

y : array_like, length graph.shape[0], optional

Categorical labels for the block assignments of the graph

bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
bic : float

The lower the better

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
mse : float

Mean square error for the model's fit P matrix

sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:
n_samples : int (default 1), optional

The number of graphs to sample

Returns:
graphs : np.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

Returns:
score : float

sum of log-loglikelihoods for each potential edge in input graph

score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

clip : scalar or None, optional (default=None)

Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:
sample_scores : np.ndarray (size of graph)

log-likelihood per potential edge in the graph

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns:
self
class graspy.models.DCSBMEstimator(degree_directed=False, directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]

Degree-corrected Stochastic Block Model

The degree-corrected stochastic block model (DCSBM) represents each node as belonging to a block (or community). For a given potential edge between node \(i\) and \(j\), the probability of an edge existing is specified by the block that nodes \(i\) and \(j\) belong to as in the SBM. However, an additional "promiscuity" parameter \(\theta\) is added for each node, allowing the vertices within a block to have heterogeneous expected degree distributions:

\(P_{ij} = \theta_i \theta_j B_{\tau_i \tau_j}\)

where \(B \in \mathbb{[0, 1]}^{K x K}\) \(\tau\) is an n_nodes length vector specifying which block each node belongs to, and \(\theta\) is an n_nodes length vector specifiying the degree correction for each node.

The degree_directed parameter of this model allows the degree correction parameter to be different for the in and out degree of each node:

\(P_{ij} = \theta_i \eta_j B_{\tau_i \tau_j}\)

where \(\theta\) and \(\eta\) need not be the same.

Read more in the tutorials

Parameters:
directed : boolean, optional (default=True)

Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.

degree_directed : boolean, optional (default=False)

Whether to fit an "in" and "out" degree correction for each node. In the degree_directed case, the fit model can have a different expected in and out degree for each node.

loops : boolean, optional (default=False)

Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.

n_components : int, optional (default=None)

Desired dimensionality of embedding for clustering to find communities. n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension`().

min_comm : int, optional (default=1)

The minimum number of communities (blocks) to consider.

max_comm : int, optional (default=10)

The maximum number of communities (blocks) to consider (inclusive).

cluster_kws : dict, optional (default={})

Additional kwargs passed down to GaussianCluster

embed_kws : dict, optional (default={})

Additional kwargs passed down to LaplacianSpectralEmbed

Attributes:
block_p_ : np.ndarray, shape (n_blocks, n_blocks)

The block probability matrix \(B\), where the element \(B_{i, j}\) represents the expected number of edges between block \(i\) and block \(j\).

p_mat_ : np.ndarray, shape (n_verts, n_verts)

Probability matrix \(P\) for the fit model, from which graphs could be sampled.

degree_corrections_ : np.ndarray, shape (n_verts, 1) or (n_verts, 2)

Degree correction vector(s) \(\theta\). If degree_directed parameter was False, then will be of shape (n_verts, 1) and element \(i\) represents the degree correction for node \(i\). Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.

vertex_assignments_ : np.ndarray, shape (n_verts)

A vector of integer labels corresponding to the predicted block that each node belongs to if y was not passed during the call to fit.

block_weights_ : np.ndarray, shape (n_blocks)

Contains the proportion of nodes that belong to each block in the fit model.

Notes

Note that many examples in the literature describe the DCSBM as being sampled with a Poisson distribution. Here, we implement this model with a Bernoulli. When individual edge probabilities are relatively low these two distributions will yield similar results.

References

[R3dc0ad7dd21a-1]Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107.
fit(self, graph, y=None)[source]

Fit the DCSBM to a graph, optionally with known block labels

If y is None, the block assignments for each vertex will first be estimated.

Parameters:
graph : array_like or networkx.Graph

Input graph to fit

y : array_like, length graph.shape[0], optional

Categorical labels for the block assignments of the graph

Returns:
self : DCSBMEstimator object

Fitted instance of self

bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
bic : float

The lower the better

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
mse : float

Mean square error for the model's fit P matrix

sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:
n_samples : int (default 1), optional

The number of graphs to sample

Returns:
graphs : np.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

Returns:
score : float

sum of log-loglikelihoods for each potential edge in input graph

score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

clip : scalar or None, optional (default=None)

Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:
sample_scores : np.ndarray (size of graph)

log-likelihood per potential edge in the graph

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns:
self

Latent position models

class graspy.models.RDPGEstimator(loops=False, n_components=None, ase_kws={}, diag_aug_weight=1, plus_c_weight=1)[source]

Random Dot Product Graph

Under the random dot product graph model, each node is assumed to have a "latent position" in some \(d\)-dimensional Euclidian space. This vector dictates that node's probability of connection to other nodes. For a given pair of nodes \(i\) and \(j\), the probability of connection is the dot product between their latent positions:

\(P_{ij} = \langle x_i, y_j \rangle\)

where \(x_i\) is the left latent position of node \(i\), and \(y_j\) is the right latent position of node \(j\). If the graph being modeled is is undirected, then \(x_i = y_i\). Latent positions can be estimated via AdjacencySpectralEmbed.

Read more in the tutorials

Parameters:
loops : boolean, optional (default=False)

Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.

n_components : int, optional (default=None)

The dimensionality of the latent space used to model the graph. If None, the method of Zhu and Godsie will be used to select an embedding dimension.

ase_kws : dict, optional (default={})

Dictionary of keyword arguments passed down to AdjacencySpectralEmbed, which is used to fit the model.

diag_aug_weight : int or float, optional (default=1)

Weighting used for diagonal augmentation, which is a form of regularization for fitting the RDPG model.

plus_c_weight : int or float, optional (default=1)

Weighting used for a constant scalar added to the adjacency matrix before embedding as a form of regularization.

Attributes:
latent_ : tuple, length 2, or np.ndarray, shape (n_verts, n_components)

The fit latent positions for the RDPG model. If a tuple, then the graph that was input to fit was directed, and the first and second elements of the tuple are the left and right latent positions, respectively. The left and right latent positions will both be of shape (n_verts, n_components). If latent_ is an array, then the graph that was input to fit was undirected and the left and right latent positions are the same.

p_mat_ : np.ndarray, shape (n_verts, n_verts)

Probability matrix \(P\) for the fit model, from which graphs could be sampled.

References

[R8a60fabb19f1-1]Athreya, A., Fishkind, D. E., Tang, M., Priebe, C. E., Park, Y., Vogelstein, J. T., ... & Sussman, D. L. (2018). Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226), 1-92.
[R8a60fabb19f1-2]Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918-930.
fit(self, graph, y=None)[source]

Calculate the parameters for the given graph model

bic(self, graph)

Bayesian information criterion for the current model on the input graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
bic : float

The lower the better

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:
deep : boolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:
params : mapping of string to any

Parameter names mapped to their values.

mse(self, graph)

Compute mean square error for the current model on the input graph

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph

Returns:
mse : float

Mean square error for the model's fit P matrix

sample(self, n_samples=1)

Sample graphs (realizations) from the fitted model

Can only be called after the the model has been fit

Parameters:
n_samples : int (default 1), optional

The number of graphs to sample

Returns:
graphs : np.array (n_samples, n_verts, n_verts)

Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.

Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score(self, graph)

Compute the average log-likelihood over each potential edge of the given graph.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

Returns:
score : float

sum of log-loglikelihoods for each potential edge in input graph

score_samples(self, graph, clip=None)

Compute the weighted log probabilities for each potential edge.

Note that this implicitly assumes the input graph is indexed like the fit model.

Parameters:
graph : np.ndarray

Input graph. Must be same shape as model's p_mat_ attribute

clip : scalar or None, optional (default=None)

Values for which to clip probability matrix, entries less than c or more than 1 - c are set to c or 1 - c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.

Returns:
sample_scores : np.ndarray (size of graph)

log-likelihood per potential edge in the graph

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Returns:
self