Embedding¶
Decomposition¶

graspy.embed.
select_dimension
(X, n_components=None, n_elbows=2, threshold=None, return_likelihoods=False)[source]¶ Generates profile likelihood from array based on Zhu and Godsie method. Elbows correspond to the optimal embedding dimension.
Parameters:  X : 1d or 2d arraylike
Input array generate profile likelihoods for. If 1darray, it should be sorted in decreasing order. If 2darray, shape should be (n_samples, n_features).
 n_components : int, optional, default: None.
Number of components to embed. If None,
n_components = floor(log2(min(n_samples, n_features)))
. Ignored if X is 1darray. n_elbows : int, optional, default: 2.
Number of likelihood elbows to return. Must be > 1.
 threshold : float, int, optional, default: None
If given, only consider the singular values that are > threshold. Must be >= 0.
 return_likelihoods : bool, optional, default: False
If True, returns the all likelihoods associated with each elbow.
Returns:  elbows : list
Elbows indicate subsequent optimal embedding dimensions. Number of elbows may be less than n_elbows if there are not enough singular values.
 sing_vals : list
The singular values associated with each elbow.
 likelihoods : list of arraylike
Array of likelihoods of the corresponding to each elbow. Only returned if return_likelihoods is True.
References
[1] Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918930.

graspy.embed.
selectSVD
(X, n_components=None, n_elbows=2, algorithm='randomized', n_iter=5)[source]¶ Dimensionality reduction using SVD.
Performs linear dimensionality reduction by using either full singular value decomposition (SVD) or truncated SVD. Full SVD is performed using SciPy's wrapper for ARPACK, while truncated SVD is performed using either SciPy's wrapper for LAPACK or Sklearn's implementation of randomized SVD.
It also performs optimal dimensionality selectiong using Zhu & Godsie algorithm [1] if number of target dimension is not specified.
Parameters:  X : arraylike, shape (n_samples, n_features)
The data to perform svd on.
 n_components : int or None, default = None
Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by
select_dimension
usingn_elbows
argument. n_elbows : int, optional, default: 2
If n_components=None, then compute the optimal embedding dimension using select_dimension. Otherwise, ignored.
 algorithm : {'randomized' (default), 'full', 'truncated'}, optional
SVD solver to use:
 'randomized'
 Computes randomized svd using
sklearn.utils.extmath.randomized_svd
 'full'
 Computes full svd using
scipy.linalg.svd
 'truncated'
 Computes truncated svd using
scipy.sparse.linalg.svd
 n_iter : int, optional (default = 5)
Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.
Returns:  U: arraylike, shape (n_samples, n_components)
Left singular vectors corresponding to singular values.
 D: arraylike, shape (n_components)
Singular values in decreasing order, as a 1d array.
 V: arraylike, shape (n_components, n_samples)
Right singular vectors corresponding to singular values.
References
[1] Zhu, M. and Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), pp.918930.
Single graph embedding¶

class
graspy.embed.
AdjacencySpectralEmbed
(n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, check_lcc=True)[source]¶ Class for computing the adjacency spectral embedding of a graph
The adjacency spectral embedding (ASE) is a kdimensional Euclidean representation of the graph based on its adjacency matrix [Rf6b1f8d417091]. It relies on an SVD to reduce the dimensionality to the specified k, or if k is unspecified, can find a number of dimensions automatically (see graspy.embed.selectSVD).
Parameters:  n_components : int or None, default = None
Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by
select_dimension
usingn_elbows
argument. n_elbows : int, optional, default: 2
If n_components=None, then compute the optimal embedding dimension using select_dimension. Otherwise, ignored.
 algorithm : {'randomized' (default), 'full', 'truncated'}, optional
SVD solver to use:
 'randomized'
 Computes randomized svd using
sklearn.utils.extmath.randomized_svd
 'full'
 Computes full svd using
scipy.linalg.svd
 'truncated'
 Computes truncated svd using
scipy.sparse.linalg.svd
 n_iter : int, optional (default = 5)
Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.
 check_lcc : bool , optional (defult = True)
Whether to check if input graph is connected. May result in nonoptimal results if the graph is unconnected. If True and input is unconnected, a UserWarning is thrown. Not checking for connectedness may result in faster computation.
Attributes:  latent_left_ : array, shape (n_samples, n_components)
Estimated left latent positions of the graph.
 latent_right_ : array, shape (n_samples, n_components), or None
Only computed when the graph is directed, or adjacency matrix is assymetric. Estimated right latent positions of the graph. Otherwise, None.
 singular_values_ : array, shape (n_components)
Singular values associated with the latent position matrices.
Notes
The singular value decomposition:
\[A = U \Sigma V^T\]is used to find an orthonormal basis for a matrix, which in our case is the adjacency matrix of the graph. These basis vectors (in the matrices U or V) are ordered according to the amount of variance they explain in the original matrix. By selecting a subset of these basis vectors (through our choice of dimensionality reduction) we can find a lower dimensional space in which to represent the graph.
References
[Rf6b1f8d417091] Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012 
fit
(self, graph, y=None)[source]¶ Fit ASE model to input graph
Parameters:  graph : array_like or networkx.Graph
Input graph to embed.
Returns:  self : returns an instance of self.

fit_transform
(self, graph, y=None)¶ Fit the model with graphs and apply the transformation.
n_dimension is either automatically determined or based on user input.
Parameters:  graph: np.ndarray or networkx.Graph
 y : Ignored
Returns:  out : np.ndarray, shape (n_vertices, n_dimension) OR tuple (len 2)
where both elements have shape (n_vertices, n_dimension) A single np.ndarray represents the latent position of an undirected graph, wheras a tuple represents the left and right latent positions for a directed graph

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self

class
graspy.embed.
LaplacianSpectralEmbed
(form='DAD', n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, check_lcc=True, regularizer=None)[source]¶ Class for computing the laplacian spectral embedding of a graph
The laplacian spectral embedding (LSE) is a kdimensional Euclidean representation of the graph based on its Laplacian matrix [Re6134e8576641]. It relies on an SVD to reduce the dimensionality to the specified k, or if k is unspecified, can find a number of dimensions automatically.
Parameters:  form : {'DAD' (default), 'IDAD', 'RDAD'}, optional
Specifies the type of Laplacian normalization to use.
 n_components : int or None, default = None
Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by
select_dimension
usingn_elbows
argument. n_elbows : int, optional, default: 2
If n_components=None, then compute the optimal embedding dimension using select_dimension. Otherwise, ignored.
 algorithm : {'randomized' (default), 'full', 'truncated'}, optional
SVD solver to use:
 'randomized'
 Computes randomized svd using
sklearn.utils.extmath.randomized_svd
 'full'
 Computes full svd using
scipy.linalg.svd
 'truncated'
 Computes truncated svd using
scipy.sparse.linalg.svd
 n_iter : int, optional (default = 5)
Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.
 check_lcc : bool , optional (defult = True)
Whether to check if input graph is connected. May result in nonoptimal results if the graph is unconnected. If True and input is unconnected, a UserWarning is thrown. Not checking for connectedness may result in faster computation.
 regularizer: int, float or None, optional (default=None)
Constant to be added to the diagonal of degree matrix. If None, average node degree is added. If int or float, must be >= 0. Only used when
form
== 'RDAD'.
Attributes:  latent_left_ : array, shape (n_samples, n_components)
Estimated left latent positions of the graph.
 latent_right_ : array, shape (n_samples, n_components), or None
Only computed when the graph is directed, or adjacency matrix is assymetric. Estimated right latent positions of the graph. Otherwise, None.
 singular_values_ : array, shape (n_components)
Singular values associated with the latent position matrices.
Notes
The singular value decomposition:
\[A = U \Sigma V^T\]is used to find an orthonormal basis for a matrix, which in our case is the Laplacian matrix of the graph. These basis vectors (in the matrices U or V) are ordered according to the amount of variance they explain in the original matrix. By selecting a subset of these basis vectors (through our choice of dimensionality reduction) we can find a lower dimensional space in which to represent the graph.
References
[Re6134e8576641] Sussman, D.L., Tang, M., Fishkind, D.E., Priebe, C.E. "A Consistent Adjacency Spectral Embedding for Stochastic Blockmodel Graphs," Journal of the American Statistical Association, Vol. 107(499), 2012 
fit
(self, graph, y=None)[source]¶ Fit LSE model to input graph
By default, uses the Laplacian normalization of the form:
\[L = D^{1/2} A D^{1/2}\]Parameters:  graph : array_like or networkx.Graph
Input graph to embed. see graphstats.utils.import_graph
 y : Ignored
Returns:  self : returns an instance of self.

fit_transform
(self, graph, y=None)¶ Fit the model with graphs and apply the transformation.
n_dimension is either automatically determined or based on user input.
Parameters:  graph: np.ndarray or networkx.Graph
 y : Ignored
Returns:  out : np.ndarray, shape (n_vertices, n_dimension) OR tuple (len 2)
where both elements have shape (n_vertices, n_dimension) A single np.ndarray represents the latent position of an undirected graph, wheras a tuple represents the left and right latent positions for a directed graph

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self
Multiple graph embedding¶

class
graspy.embed.
OmnibusEmbed
(n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, check_lcc=True)[source]¶ Omnibus embedding of arbitrary number of input graphs with matched vertex sets.
Given \(A_1, A_2, ..., A_m\) a collection of (possibly weighted) adjacency matrices of a collection \(m\) undirected graphs with matched vertices. Then the \((mn \times mn)\) omnibus matrix, \(M\), has the subgraph where \(M_{ij} = \frac{1}{2}(A_i + A_j)\). The omnibus matrix is then embedded using adjacency spectral embedding [R639bade8530a1].
Parameters:  n_components : int or None, default = None
Desired dimensionality of output data. If "full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by
select_dimension
usingn_elbows
argument. n_elbows : int, optional, default: 2
If n_components=None, then compute the optimal embedding dimension using select_dimension. Otherwise, ignored.
 algorithm : {'randomized' (default), 'full', 'truncated'}, optional
SVD solver to use:
 'randomized'
 Computes randomized svd using
sklearn.utils.extmath.randomized_svd
 'full'
 Computes full svd using
scipy.linalg.svd
 'truncated'
 Computes truncated svd using
scipy.sparse.linalg.svd
 n_iter : int, optional (default = 5)
Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.
 check_lcc : bool , optional (defult = True)
Whether to check if the average of all input graphs are connected. May result in nonoptimal results if the average graph is unconnected. If True and average graph is unconnected, a UserWarning is thrown.
Attributes:  n_graphs_ : int
Number of graphs
 n_vertices_ : int
Number of vertices in each graph
 latent_left_ : array, shape (n_graphs, n_vertices, n_components)
Estimated left latent positions of the graph.
 latent_right_ : array, shape (n_graphs, n_vertices, n_components), or None
Only computed when the graph is directed, or adjacency matrix is asymmetric. Estimated right latent positions of the graph. Otherwise, None.
 singular_values_ : array, shape (n_components)
Singular values associated with the latent position matrices.
References
[R639bade8530a1] Levin, K., Athreya, A., Tang, M., Lyzinski, V., & Priebe, C. E. (2017, November). A central limit theorem for an omnibus embedding of multiple random dot product graphs. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on (pp. 964967). IEEE. 
fit
(self, graphs, y=None)[source]¶ Fit the model with graphs.
Parameters:  graphs : list of nx.Graph or ndarray, or ndarray
If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).
 y : Ignored
Returns:  self : returns an instance of self.

fit_transform
(self, graphs, y=None)[source]¶ Fit the model with graphs and apply the embedding on graphs. n_components is either automatically determined or based on user input.
Parameters:  graphs : list of nx.Graph or ndarray, or ndarray
If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).
 y : Ignored
Returns:  out : arraylike, shape (n_graphs, n_vertices, n_components) if input
graphs were symmetric. If graphs were directed, returns tuple of two arrays (same shape as above) where the first corresponds to the left latent positions, and the right to the right latent positions

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self

class
graspy.embed.
MultipleASE
(n_components=None, n_elbows=2, algorithm='randomized', n_iter=5, scaled=False)[source]¶ Multiple Adjacency Spectral Embedding (MASE) embeds arbitrary number of input graphs with matched vertex sets.
For a population of undirected graphs, MASE assumes that the population of graphs is sampled from \(VR^{(i)}V^T\) where \(V \in \mathbb{R}^{n\times d}\) and \(R^{(i)} \in \mathbb{R}^{d\times d}\). Score matrices, \(R^{(i)}\), are allowed to vary for each graph, but are symmetric. All graphs share a common a latent position matrix \(V\).
For a population of directed graphs, MASE assumes that the population is sampled from \(UR^{(i)}V^T\) where \(U \in \mathbb{R}^{n\times d_1}\), \(V \in \mathbb{R}^{n\times d_2}\), and \(R^{(i)} \in \mathbb{R}^{d_1\times d_2}\). In this case, score matrices \(R^{(i)}\) can be assymetric and nonsquare, but all graphs still share a common latent position matrices \(U\) and \(V\).
Parameters:  n_components : int or None, (default=None)
Desired dimensionality of output data. If algorithm=="full", n_components must be <= min(X.shape). Otherwise, n_components must be < min(X.shape). If None, then optimal dimensions will be chosen by select_dimension.
 n_elbows : int, optional (default=2)
If n_compoents=None, then compute the optimal embedding dimension using select_dimension. Otherwise, ignored.
 algorithm : {'full', 'truncated', 'randomized' (default)}, optional
SVD solver to use:
 'full'
 Computes full svd using
scipy.linalg.svd
 'truncated'
 Computes truncated svd using
scipy.sparse.linalg.svd
 'randomized'
 Computes randomized svd using
sklearn.utils.extmath.randomized_svd
 n_iter : int, optional (default=5)
Number of iterations for randomized SVD solver. Not used by 'full' or 'truncated'. The default is larger than the default in randomized_svd to handle sparse matrices that may have large slowly decaying spectrum.
 scaled : bool, optional (default=False)
Whether to scale individual eigenvectors with eigenvalues in first embedding stage.
Attributes:  n_graphs_ : int
Number of graphs
 n_vertices_ : int
Number of vertices in each graph
 latent_left_ : array, shape (n_samples, n_components)
Estimated left latent positions of the graph.
 latent_right_ : array, shape (n_samples, n_components), or None
Estimated right latent positions of the graph. Only computed when the an input graph is directed, or adjacency matrix is assymetric. Otherwise, None.
 scores_ : array, shape (n_samples, n_components, n_components)
Estimated \(\hat{R}\) matrices for each input graph.
Notes
When an input graph is directed, n_components of latent_left_ may not be equal to n_components of latent_right_.

fit
(self, graphs, y=None)[source]¶ Fit the model with graphs.
Parameters:  graphs : list of nx.Graph or ndarray, or ndarray
If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).
 y : Ignored
Returns:  self : returns an instance of self.

fit_transform
(self, graphs, y=None)[source]¶ Fit the model with graphs and apply the embedding on graphs. n_components is either automatically determined or based on user input.
Parameters:  graphs : list of nx.Graph or ndarray, or ndarray
If list of nx.Graph, each Graph must contain same number of nodes. If list of ndarray, each array must have shape (n_vertices, n_vertices). If ndarray, then array must have shape (n_graphs, n_vertices, n_vertices).
 y : Ignored
Returns:  out : arraylike, shape (n_graphs, n_vertices, n_components) if input
graphs were symmetric. If graphs were directed, returns tuple of two arrays (same shape as above) where the first corresponds to the left latent positions, and the right to the right latent positions

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self
Dissimilarity graph embedding¶

class
graspy.embed.
ClassicalMDS
(n_components=None, dissimilarity='euclidean')[source]¶ Classical multidimensional scaling (cMDS).
cMDS seeks a lowdimensional representation of the data in which the distances respect well the distances in the original highdimensional space.
Parameters:  n_components : int, or None
Number of components to keep. If None, then it will run
select_dimension
to find the optimal embedding dimension. dissimilarity : 'euclidean'  'precomputed', optional, default: 'euclidean'
Dissimilarity measure to use:
 'euclidean':
 Pairwise Euclidean distances between points in the dataset.
 'precomputed':
 Precomputed dissimilarities are passed directly to
fit
andfit_transform
.
Attributes:  n_components : int
Equals the parameter n_components. If input n_components was None, then equals the optimal embedding dimension.
 components_ : array, shape (n_components, n_features)
Principal axes in feature space.
 singular_values_ : array, shape (n_components,)
The singular values corresponding to each of the selected components.
 dissimilarity_matrix_ : array, shape (n_features, n_features)
Dissimilarity matrix
References
Wickelmaier, Florian. "An introduction to MDS." Sound Quality Research Unit, Aalborg University, Denmark 46.5 (2003).

fit
(self, X, y=None)[source]¶ Fit the model with X.
Parameters:  X : array_like
If
dissimilarity=='precomputed'
, the input should be the dissimilarity matrix with shape (n_samples, n_samples). Ifdissimilarity=='euclidean'
, then the input should be 2darray with shape (n_samples, n_features) or a 3darray with shape (n_samples, n_features_1, n_features_2). y : Ignored

fit_transform
(self, X, y=None)[source]¶ Fit the data from X, and returns the embedded coordinates.
Parameters:  X : ndarray
If
dissimilarity=='precomputed'
, the input should be the dissimilarity matrix with shape (n_samples, n_samples). Ifdissimilarity=='euclidean'
, then the input should be array with shape (n_samples, n_features) or a ndarray with shape (n_samples, n_features_1, n_features_2, ..., n_features_d). First axis of ndarray must ben_samples
. y : Ignored
Returns:  X_new : arraylike, shape (n_samples, n_components)
Embedded input.

get_params
(self, deep=True)¶ Get parameters for this estimator.
Parameters:  deep : boolean, optional
If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:  params : mapping of string to any
Parameter names mapped to their values.

set_params
(self, **params)¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
<component>__<parameter>
so that it's possible to update each component of a nested object.Returns:  self