Models¶
ErdosReyni models¶

class
graspy.models.
EREstimator
(directed=True, loops=False)[source]¶ ErdosReyni Model
The ErdosReyni (ER) model is a simple random graph model in which the probability of any potential edge in the graph existing is the same for any two nodes \(i\) and \(j\).
\(P_{ij} = p\) for all i, j
Read more in the tutorials
Parameters:  directed : boolean, optional (default=True)
Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
 loops : boolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
Attributes:  p_ : float
Value between 0 and 1 (inclusive) representing the probability of any edge in the ER graph model
 p_mat_ : np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
References
[Rce4f95a7a4101] https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model 
fit
(self, graph, y=None)[source]¶ Fit the SBM to a graph, optionally with known block labels
If y is None, the block assignments for each vertex will first be estimated.
Parameters:  graph : array_like or networkx.Graph
Input graph to fit
 y : array_like, length graph.shape[0], optional
Categorical labels for the block assignments of the graph

bic
(self, graph)¶ Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  bic : float
The lower the better

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

mse
(self, graph)¶ Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  mse : float
Mean square error for the model's fit P matrix

sample
(self, n_samples=1)¶ Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
Parameters:  n_samples : int (default 1), optional
The number of graphs to sample
Returns:  graphs : np.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score
(self, graph)¶ Compute the average loglikelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
Returns:  score : float
sum of logloglikelihoods for each potential edge in input graph

score_samples
(self, graph, clip=None)¶ Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute clip : scalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1  c are set to c or 1  c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
Returns:  sample_scores : np.ndarray (size of
graph
) loglikelihood per potential edge in the graph

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self

class
graspy.models.
DCEREstimator
(directed=True, loops=False, degree_directed=False)[source]¶ Degreecorrected ErdosReyni Model
The Degreecorrected ErdosReyni (DCER) model is an extension of the ER model in which each node has an additional "promiscuity" parameter \(\theta_i\) that determines its expected degree in the graph.
\(P_{ij} = \theta_i \theta_j p\)
Read more in the tutorials
Parameters:  directed : boolean, optional (default=True)
Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
 loops : boolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
 degree_directed : boolean
Whether to allow seperate degree correction parameters for the in and out degree of each node. Ignored if
directed
is False.
Attributes:  p_ : float
The \(p\) parameter as described in the above model, which weights the overall probability of connections between any two nodes.
 p_mat_ : np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
 degree_corrections_ : np.ndarray, shape (n_verts, 1) or (n_verts, 2)
Degree correction vector(s) \(\theta\). If
degree_directed
parameter was False, then will be of shape (n_verts, 1) and element i represents the degree correction for node \(i\). Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.
Notes
The DCER model is rarely mentioned in literature, though it is simply a special case of the DCSBM where there is only one community.
References
[R52fcc65416a41] https://en.wikipedia.org/wiki/Erd%C5%91s%E2%80%93R%C3%A9nyi_model [R52fcc65416a42] Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107. 
fit
(self, graph, y=None)[source]¶ Fit the DCSBM to a graph, optionally with known block labels
If y is None, the block assignments for each vertex will first be estimated.
Parameters:  graph : array_like or networkx.Graph
Input graph to fit
 y : array_like, length graph.shape[0], optional
Categorical labels for the block assignments of the graph
Returns:  self :
DCSBMEstimator
object Fitted instance of self

bic
(self, graph)¶ Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  bic : float
The lower the better

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

mse
(self, graph)¶ Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  mse : float
Mean square error for the model's fit P matrix

sample
(self, n_samples=1)¶ Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
Parameters:  n_samples : int (default 1), optional
The number of graphs to sample
Returns:  graphs : np.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score
(self, graph)¶ Compute the average loglikelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
Returns:  score : float
sum of logloglikelihoods for each potential edge in input graph

score_samples
(self, graph, clip=None)¶ Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute clip : scalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1  c are set to c or 1  c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
Returns:  sample_scores : np.ndarray (size of
graph
) loglikelihood per potential edge in the graph

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self
Stochastic block models¶

class
graspy.models.
SBMEstimator
(directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]¶ Stochastic Block Model
The stochastic block model (SBM) represents each node as belonging to a block (or community). For a given potential edge between node \(i\) and \(j\), the probability of an edge existing is specified by the block that nodes \(i\) and \(j\) belong to:
\(P_{ij} = B_{\tau_i \tau_j}\)
where \(B \in \mathbb{[0, 1]}^{K x K}\) and \(\tau\) is an n_nodes length vector specifying which block each node belongs to.
Read more in the tutorials
Parameters:  directed : boolean, optional (default=True)
Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
 loops : boolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
 n_components : int, optional (default=None)
Desired dimensionality of embedding for clustering to find communities.
n_components
must be< min(X.shape)
. If None, then optimal dimensions will be chosen byselect_dimension`()
. min_comm : int, optional (default=1)
The minimum number of communities (blocks) to consider.
 max_comm : int, optional (default=10)
The maximum number of communities (blocks) to consider (inclusive).
 cluster_kws : dict, optional (default={})
Additional kwargs passed down to
GaussianCluster
 embed_kws : dict, optional (default={})
Additional kwargs passed down to
AdjacencySpectralEmbed
Attributes:  block_p_ : np.ndarray, shape (n_blocks, n_blocks)
The block probability matrix \(B\), where the element \(B_{i, j}\) represents the probability of an edge between block \(i\) and block \(j\).
 p_mat_ : np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
 vertex_assignments_ : np.ndarray, shape (n_verts)
A vector of integer labels corresponding to the predicted block that each node belongs to if
y
was not passed during the call tofit
. block_weights_ : np.ndarray, shape (n_blocks)
Contains the proportion of nodes that belong to each block in the fit model.
References
[R09567a76fb041] Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social networks, 5(2), 109137. 
fit
(self, graph, y=None)[source]¶ Fit the SBM to a graph, optionally with known block labels
If y is None, the block assignments for each vertex will first be estimated.
Parameters:  graph : array_like or networkx.Graph
Input graph to fit
 y : array_like, length graph.shape[0], optional
Categorical labels for the block assignments of the graph

bic
(self, graph)¶ Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  bic : float
The lower the better

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

mse
(self, graph)¶ Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  mse : float
Mean square error for the model's fit P matrix

sample
(self, n_samples=1)¶ Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
Parameters:  n_samples : int (default 1), optional
The number of graphs to sample
Returns:  graphs : np.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score
(self, graph)¶ Compute the average loglikelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
Returns:  score : float
sum of logloglikelihoods for each potential edge in input graph

score_samples
(self, graph, clip=None)¶ Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute clip : scalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1  c are set to c or 1  c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
Returns:  sample_scores : np.ndarray (size of
graph
) loglikelihood per potential edge in the graph

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self

class
graspy.models.
DCSBMEstimator
(degree_directed=False, directed=True, loops=False, n_components=None, min_comm=1, max_comm=10, cluster_kws={}, embed_kws={})[source]¶ Degreecorrected Stochastic Block Model
The degreecorrected stochastic block model (DCSBM) represents each node as belonging to a block (or community). For a given potential edge between node \(i\) and \(j\), the probability of an edge existing is specified by the block that nodes \(i\) and \(j\) belong to as in the SBM. However, an additional "promiscuity" parameter \(\theta\) is added for each node, allowing the vertices within a block to have heterogeneous expected degree distributions:
\(P_{ij} = \theta_i \theta_j B_{\tau_i \tau_j}\)
where \(B \in \mathbb{[0, 1]}^{K x K}\) \(\tau\) is an n_nodes length vector specifying which block each node belongs to, and \(\theta\) is an n_nodes length vector specifiying the degree correction for each node.
The
degree_directed
parameter of this model allows the degree correction parameter to be different for the in and out degree of each node:\(P_{ij} = \theta_i \eta_j B_{\tau_i \tau_j}\)
where \(\theta\) and \(\eta\) need not be the same.
Read more in the tutorials
Parameters:  directed : boolean, optional (default=True)
Whether to treat the input graph as directed. Even if a directed graph is inupt, this determines whether to force symmetry upon the block probability matrix fit for the SBM. It will also determine whether graphs sampled from the model are directed.
 degree_directed : boolean, optional (default=False)
Whether to fit an "in" and "out" degree correction for each node. In the degree_directed case, the fit model can have a different expected in and out degree for each node.
 loops : boolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
 n_components : int, optional (default=None)
Desired dimensionality of embedding for clustering to find communities.
n_components
must be< min(X.shape)
. If None, then optimal dimensions will be chosen byselect_dimension`()
. min_comm : int, optional (default=1)
The minimum number of communities (blocks) to consider.
 max_comm : int, optional (default=10)
The maximum number of communities (blocks) to consider (inclusive).
 cluster_kws : dict, optional (default={})
Additional kwargs passed down to
GaussianCluster
 embed_kws : dict, optional (default={})
Additional kwargs passed down to
LaplacianSpectralEmbed
Attributes:  block_p_ : np.ndarray, shape (n_blocks, n_blocks)
The block probability matrix \(B\), where the element \(B_{i, j}\) represents the expected number of edges between block \(i\) and block \(j\).
 p_mat_ : np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
 degree_corrections_ : np.ndarray, shape (n_verts, 1) or (n_verts, 2)
Degree correction vector(s) \(\theta\). If degree_directed parameter was False, then will be of shape (n_verts, 1) and element \(i\) represents the degree correction for node \(i\). Otherwise, the first column contains out degree corrections and the second column contains in degree corrections.
 vertex_assignments_ : np.ndarray, shape (n_verts)
A vector of integer labels corresponding to the predicted block that each node belongs to if
y
was not passed during the call tofit
. block_weights_ : np.ndarray, shape (n_blocks)
Contains the proportion of nodes that belong to each block in the fit model.
Notes
Note that many examples in the literature describe the DCSBM as being sampled with a Poisson distribution. Here, we implement this model with a Bernoulli. When individual edge probabilities are relatively low these two distributions will yield similar results.
References
[R3dc0ad7dd21a1] Karrer, B., & Newman, M. E. (2011). Stochastic blockmodels and community structure in networks. Physical review E, 83(1), 016107. 
fit
(self, graph, y=None)[source]¶ Fit the DCSBM to a graph, optionally with known block labels
If y is None, the block assignments for each vertex will first be estimated.
Parameters:  graph : array_like or networkx.Graph
Input graph to fit
 y : array_like, length graph.shape[0], optional
Categorical labels for the block assignments of the graph
Returns:  self :
DCSBMEstimator
object Fitted instance of self

bic
(self, graph)¶ Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  bic : float
The lower the better

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

mse
(self, graph)¶ Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  mse : float
Mean square error for the model's fit P matrix

sample
(self, n_samples=1)¶ Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
Parameters:  n_samples : int (default 1), optional
The number of graphs to sample
Returns:  graphs : np.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score
(self, graph)¶ Compute the average loglikelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
Returns:  score : float
sum of logloglikelihoods for each potential edge in input graph

score_samples
(self, graph, clip=None)¶ Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute clip : scalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1  c are set to c or 1  c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
Returns:  sample_scores : np.ndarray (size of
graph
) loglikelihood per potential edge in the graph

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self
Latent position models¶

class
graspy.models.
RDPGEstimator
(loops=False, n_components=None, ase_kws={}, diag_aug_weight=1, plus_c_weight=1)[source]¶ Random Dot Product Graph
Under the random dot product graph model, each node is assumed to have a "latent position" in some \(d\)dimensional Euclidian space. This vector dictates that node's probability of connection to other nodes. For a given pair of nodes \(i\) and \(j\), the probability of connection is the dot product between their latent positions:
\(P_{ij} = \langle x_i, y_j \rangle\)
where \(x_i\) is the left latent position of node \(i\), and \(y_j\) is the right latent position of node \(j\). If the graph being modeled is is undirected, then \(x_i = y_i\). Latent positions can be estimated via
AdjacencySpectralEmbed
.Read more in the tutorials
Parameters:  loops : boolean, optional (default=False)
Whether to allow entries on the diagonal of the adjacency matrix, i.e. loops in the graph where a node connects to itself.
 n_components : int, optional (default=None)
The dimensionality of the latent space used to model the graph. If None, the method of Zhu and Godsie will be used to select an embedding dimension.
 ase_kws : dict, optional (default={})
Dictionary of keyword arguments passed down to
AdjacencySpectralEmbed
, which is used to fit the model. diag_aug_weight : int or float, optional (default=1)
Weighting used for diagonal augmentation, which is a form of regularization for fitting the RDPG model.
 plus_c_weight : int or float, optional (default=1)
Weighting used for a constant scalar added to the adjacency matrix before embedding as a form of regularization.
Attributes:  latent_ : tuple, length 2, or np.ndarray, shape (n_verts, n_components)
The fit latent positions for the RDPG model. If a tuple, then the graph that was input to fit was directed, and the first and second elements of the tuple are the left and right latent positions, respectively. The left and right latent positions will both be of shape (n_verts, n_components). If latent_ is an array, then the graph that was input to fit was undirected and the left and right latent positions are the same.
 p_mat_ : np.ndarray, shape (n_verts, n_verts)
Probability matrix \(P\) for the fit model, from which graphs could be sampled.
References
[R8a60fabb19f11] Athreya, A., Fishkind, D. E., Tang, M., Priebe, C. E., Park, Y., Vogelstein, J. T., ... & Sussman, D. L. (2018). Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226), 192. [R8a60fabb19f12] Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918930. 
bic
(self, graph)¶ Bayesian information criterion for the current model on the input graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  bic : float
The lower the better

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

mse
(self, graph)¶ Compute mean square error for the current model on the input graph
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph
Returns:  mse : float
Mean square error for the model's fit P matrix

sample
(self, n_samples=1)¶ Sample graphs (realizations) from the fitted model
Can only be called after the the model has been fit
Parameters:  n_samples : int (default 1), optional
The number of graphs to sample
Returns:  graphs : np.array (n_samples, n_verts, n_verts)
Array of sampled graphs, where the first dimension indexes each sample, and the other dimensions represent (n_verts x n_verts) adjacency matrices for the sampled graphs.
Note that if only one sample is drawn, a (1, n_verts, n_verts) array will still be returned.

score
(self, graph)¶ Compute the average loglikelihood over each potential edge of the given graph.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute
Returns:  score : float
sum of logloglikelihoods for each potential edge in input graph

score_samples
(self, graph, clip=None)¶ Compute the weighted log probabilities for each potential edge.
Note that this implicitly assumes the input graph is indexed like the fit model.
Parameters:  graph : np.ndarray
Input graph. Must be same shape as model's
p_mat_
attribute clip : scalar or None, optional (default=None)
Values for which to clip probability matrix, entries less than c or more than 1  c are set to c or 1  c, respectively. If None, values will not be clipped in the likelihood calculation, which may result in poorly behaved likelihoods depending on the model.
Returns:  sample_scores : np.ndarray (size of
graph
) loglikelihood per potential edge in the graph

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self