Nomination

Spectral Vertex Nomination

class graspologic.nominate.SpectralVertexNomination(input_graph: bool = True, embedder: Union[str, graspologic.embed.base.BaseSpectralEmbed] = 'ase', n_neighbors=None, metric: str = 'euclidean', metric_params: dict = None)[source]

Class for spectral vertex nomination on a single graph.

Given a graph \(G=(V,E)\) and a subset of \(V\) called \(S\) (the "seed"), Single Graph Vertex Nomination is the problem of ranking all \(V\) in order of relation to members of \(S\). Spectral Vertex Nomination solves this problem by embedding \(G\) into a low dimensional euclidean space (tutorials), and then generating a nomination list by some distance based algorithm. In the simple unattributed case, for each seed vertex \(u\), the other vertices are ranked in order of euclidean distance from \(u\).

Parameters:

input_graph: bool, default = True

Flag whether to expect two full graphs, or the embeddings.

  • True
    .fit and .fit_predict() expect graphs as adjacency matrix, provided as ndarray of shape (n, n). They will be embedded using the specified embedder.
  • False
    .fit() and .fit_predict() expect an embedding of the graph, i.e. a ndarray of size (n, d).

embedder: str or BaseEmbed, default = 'ASE'

May provide either a embed object or a string indicating which embedding method to use, which may be either: "ASE" for AdjacencySpectralEmbed or "LSE" for LaplacianSpectralEmbed.

n_neighbors: int, default=None

The number of vertices to nominate for each seed.

metric : str, default = 'euclidean'

Distance metric to use when finding the nearest neighbors, all sklearn metrics available.

metric_params : dict, default = None

Arguments for the sklearn DistanceMetric specified via metric parameter.

Attributes:

nearest_neighbors_ : sklearn.neighbors.NearestNeighbors

A fit sklearn NearestNeighbors classifier used to find closest vertices to each seed.

References

[R0bdc65bd2adf-1]Fishkind, D. E.; Lyzinski, V.; Pao, H.; Chen, L.; Priebe, C. E. Vertex nomination schemes for membership prediction. Ann. Appl. Stat. 9 2015. https://projecteuclid.org/euclid.aoas/1446488749
[R0bdc65bd2adf-2]Jordan Yoder, Li Chen, Henry Pao, Eric Bridgeford, Keith Levin, Donniell E. Fishkind, Carey Priebe, Vince Lyzinski, Vertex nomination: The canonical sampling and the extended spectral nomination schemes, Computational Statistics & Data Analysis, Volume 145, 2020. http://www.sciencedirect.com/science/article/pii/S0167947320300074
fit(self, X: numpy.ndarray, y=None)[source]

Constructs the embedding if not provided, then calculates the pairwise distance from each seed to each vertex in graph.

Parameters:

X : np.ndarray

  • If input_graph is True
    Expects a graph as an adjacency matrix, i.e. an ndarray of shape (n, n). Will be embedded using the specified embedder.
  • If input_graph is False
    Expects an embedding of the graph, i.e. a ndarray of size (n, d).

y : NoneType

Included by sklearn convention.

Returns:

self : object

Returns an instance of self.

predict(self, y: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][source]

Nominates vertices for each seed vertex. Methodology is distance based ranking.

Parameters:

y : np.ndarray

The indices of the seed vertices. Should be a dim 1 array with length less than \(|V|\).

Returns:

Nomination List : np.ndarray

Shape is (number_vertices, number_vertices_in_seed) . Each column is a seed vertex, and the rows of each column are a list of vertex indexes from the original adjacency matrix in order degree of match.

Distance Matrix : np.ndarray

The matrix of distances associated with each element of the nomination list.

fit_predict(self, X: numpy.ndarray, y: numpy.ndarray) → Tuple[numpy.ndarray, numpy.ndarray][source]

Calls this class' fit and then predict methods.

Parameters:

X : np.ndarray

  • If input_graph is True
    Expects a graph as an adjacency matrix, i.e. an ndarray of shape (n, n). Will be embedded using the specified embedder.
  • If input_graph is False
    Expects an embedding of the graph, i.e. a ndarray of size (n, d).

y : np.ndarray.

List of unattributed seed vertex indices.

Returns:

Nomination List : np.ndarray

Shape is (number_vertices, number_vertices_in_seed) . Each column is a seed vertex, and the rows of each column are a list of vertex indexes from the original adjacency matrix in order degree of match.

Distance Matrix : np.ndarray

The matrix of distances associated with each element of the nomination list.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : dict

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**params : dict

Estimator parameters.

Returns:

self : estimator instance

Estimator instance.

Vertex Nomination via SGM

class graspologic.nominate.VNviaSGM(order_voi_subgraph=1, order_seeds_subgraph=1, n_init=100, max_nominations=None, graph_match_kws={})[source]

This class implements Vertex Nomination via Seeded Graph Matching (VNviaSGM) with the algorithm described in [1].

Rather than providing a 1-1 matching for the vertices of two graphs, as in GraphMatch, VNviaSGM ranks the potential matches for a vertex of interst (VOI) in one to graph to the vertices in another graph, based on probability of matching.

Parameters:

order_voi_subgraph: int, positive (default = 1)

Order used to create induced subgraph on A about VOI where the max distance between VOI and other nodes is order_voi_subgraph. This induced subgraph will be used to determine what seeds are used when the SGM algorithm is called. If no seeds are in this subgraph about VOI, then a UserWarning is thrown, and nomination_list_ is None.

order_seeds_subgraph: int, positive (default = 1)

Order used to create induced subgraphs on A and B. These subgraphs are centered about the seeds that were determined by the subgraph generated by order_voi_subgraph. These two subgraphs will be passed into the SGM algorithm.

n_init: int, positive (default = 100)

Number of random initializations of the seeded graph matching algorithm (SGM). Increasing the number of restarts will make the probabilities returned more precise.

max_nominations: int (default = None)

Max number of nominations to include in the nomination list. If None is passed, then all nominations computed will be returned.

graph_match_kws : dict (default = {})

Gives users the option to pass custom arguments to the graph matching algorithm. Format should be {'arg_name': arg_value, ...}. See GraphMatch

Attributes:

n_seeds_: int

Number of seeds passed in seedsA that occured in the induced subgraph about VOI

nomination_list_: 2d-array

An array containing vertex nominations in the form nomination list = [[j, p_val],...] where p_val is the probability that the VOI matches to node j in graph B (sorted by descending probability)

Notes

VNviaSGM generates an initial induced subgraph about the VOI to determine which seeds are close enough to be used. If no seeds are close enough, then a warning is thrown and nomination_list_ is set to None.

All the seeds that are close enough are then used to generate subgraphs in both A and B. These subgraphs are matched using several random initializations of the seeded graph matching algorithm (SGM), and a nomination list is returned. See GraphMatch for SGM docs

References

[R9555ba47fdc0-1]Patsolic, HG, Park, Y, Lyzinski, V, Priebe, CE. Vertex nomination via seeded graph matching. Stat Anal Data Min: The ASA Data Sci Journal. 2020; 13: 229– 244. https://doi.org/10.1002/sam.11454
fit(self, A, B, voi, seeds)[source]

Fits the model to two graphs.

Parameters:

A: 2d-array, square

Adjacency matrix, the graph where voi is known

B: 2d-array, square

Adjacency matrix, the graph where voi is not known

voi: int

Vertex of interest

seeds: list, 2d-array

List of length two, of form [seedsA, seedsB]. The elements of seedsA and seedsB are vertex indices from A and B, respectively, which are known to be matched; that is, vertex seedsA[i] is matched to vertex seedsB[i]. Note: len(seedsA)==len(seedsB).

Returns:
self: An instance of self
fit_predict(self, A, B, voi, seeds)[source]

Fits model to two adjacency matrices and returns nomination list

Parameters:

A: 2d-array, square

Adjacency matrix, the graph where voi is known

B: 2d-array, square

Adjacency matrix, the graph where voi is not known

voi: int

Vertex of interest

seeds: list, 2d-array

List of length two, of form [seedsA, seedsB]. The elements of seedsA and seedsB are vertex indices from A and B, respectively, which are known to be matched; that is, vertex seedsA[i] is matched to vertex seedsB[i]. Note: len(seedsA)==len(seedsB).

Returns:

nomination_list_ : 2d-array

The nomination list.

get_params(self, deep=True)

Get parameters for this estimator.

Parameters:

deep : bool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params : dict

Parameter names mapped to their values.

set_params(self, **params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it's possible to update each component of a nested object.

Parameters:

**params : dict

Estimator parameters.

Returns:

self : estimator instance

Estimator instance.