# Preprocessing¶

## Graph Cuts¶

### Constants¶

graspologic.preprocessing.LARGER_THAN_INCLUSIVE

Cut any edge or node > the cut_threshold

graspologic.preprocessing.LARGER_THAN_EXCLUSIVE

Cut any edge or node >= the cut_threshold

graspologic.preprocessing.SMALLER_THAN_INCLUSIVE

Cut any edge or node < the cut_threshold

graspologic.preprocessing.SMALLER_THAN_EXCLUSIVE

Cut any edge or node <= the cut_threshold

### Classes¶

class graspologic.preprocessing.DefinedHistogram[source]

Contains the histogram and the edges of the bins in the histogram. The bin_edges will have a length 1 greater than the histogram, as it defines the minimal and maximal edges as well as each edge in between.

Create new instance of DefinedHistogram(histogram, bin_edges)

histogram

Alias for field number 0

bin_edges

Alias for field number 1

count(self, value, /)

Return number of occurrences of value.

index(self, value, start=0, stop=9223372036854775807, /)

Return first index of value.

Raises ValueError if the value is not present.

### Functions¶

graspologic.preprocessing.cut_edges_by_weight(graph: Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph], cut_threshold: Union[int, float], cut_process: str, weight_attribute: str = 'weight', prune_isolates: bool = False) → Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph][source]

Thresholds edges (removing them from the graph and returning a copy) by weight.

Parameters: graph : Union[nx.Graph, nx.DiGraph] The graph that will be copied and pruned. cut_threshold : Union[int, float] The threshold for making cuts based on weight. cut_process : str Describes how we should make the cut; cut all edges larger or smaller than the cut_threshold, and whether exclusive or inclusive. Allowed values are larger_than_inclusive larger_than_exclusive smaller_than_inclusive smaller_than_exclusive weight_attribute : str The weight attribute name in the edge's data dictionary. Default is weight. prune_isolates : bool If true, remove any vertex that no longer has an edge. Note that this only prunes vertices which have edges to be pruned; any isolate vertex prior to any edge cut will be retained. Union[nx.Graph, nx.DiGraph] Pruned copy of the same type of graph provided

Notes

Edges without a weight_attribute field will be excluded from these cuts. Enable logging to view any messages about edges without weights.

graspologic.preprocessing.cut_vertices_by_betweenness_centrality(graph: Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph], cut_threshold: Union[int, float], cut_process: str, num_random_samples: Union[int, NoneType] = None, normalized: bool = True, weight_attribute: Union[str, NoneType] = 'weight', include_endpoints: bool = False, random_seed: Union[int, random.Random, numpy.random.mtrand.RandomState, NoneType] = None) → Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph][source]

Given a graph and a cut_threshold and a cut_process, return a copy of the graph with the vertices outside of the cut_threshold.

The betweenness centrality calculation can take advantage of networkx' implementation of randomized sampling by providing num_random_samples (or k, in networkx betweenness_centrality nomenclature).

Parameters: graph : Union[nx.Graph, nx.DiGraph] The graph that will be copied and pruned. cut_threshold : Union[int, float] The threshold for making cuts based on weight. cut_process : str Describes how we should make the cut; cut all edges larger or smaller than the cut_threshold, and whether exclusive or inclusive. Allowed values are larger_than_inclusive larger_than_exclusive smaller_than_inclusive smaller_than_exclusive num_random_samples : Optional[int] Use num_random_samples for vertex samples to estimate betweenness. num_random_samples should be <= len(graph.nodes). The larger num_random_samples is, the better the approximation. Default is None. normalized : bool If True the betweenness values are normalized by $$2/((n-1)(n-2))$$ for undirected graphs, and $$1/((n-1)(n-2))$$ for directed graphs where n is the number of vertices in the graph. Default is True weight_attribute : Optional[str] If None, all edge weights are considered equal. Otherwise holds the name of the edge attribute used as weight. Default is weight include_endpoints : bool If True include the endpoints in the shortest path counts. Default is False random_seed : Optional[Union[int, random.Random, np.random.RandomState]] Random seed or preconfigured random instance to be used for selecting random samples. Only used if num_random_samples is set. None will generate a new random state. Specifying a random state will provide consistent results between runs. Union[nx.Graph, nx.DiGraph] Pruned copy of the same type of graph provided
graspologic.preprocessing.cut_vertices_by_degree_centrality(graph: Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph], cut_threshold: Union[int, float], cut_process: str) → Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph][source]

Given a graph and a cut_threshold and a cut_process, return a copy of the graph with the vertices outside of the cut_threshold.

Parameters: graph : Union[nx.Graph, nx.DiGraph] The graph that will be copied and pruned. cut_threshold : Union[int, float] The threshold for making cuts based on weight. cut_process : str Describes how we should make the cut; cut all edges larger or smaller than the cut_threshold, and whether exclusive or inclusive. Allowed values are larger_than_inclusive larger_than_exclusive smaller_than_inclusive smaller_than_exclusive Union[nx.Graph, nx.DiGraph] Pruned copy of the same type of graph provided
graspologic.preprocessing.histogram_betweenness_centrality(graph: Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph], bin_directive: Union[int, List[Union[int, float]], numpy.ndarray, str] = 10, num_random_samples: Union[int, NoneType] = None, normalized: bool = True, weight_attribute: Union[str, NoneType] = 'weight', include_endpoints: bool = False, random_seed: Union[int, random.Random, numpy.random.mtrand.RandomState, NoneType] = None) → graspologic.preprocessing.graph_cuts.DefinedHistogram[source]

Generates a histogram of the vertex betweenness centrality of the provided graph. Histogram function is fundamentally proxied through to numpy's histogram function, and bin selection follows numpy.histogram() processes.

The betweenness centrality calculation can take advantage of networkx' implementation of randomized sampling by providing num_random_samples (or k, in networkx betweenness_centrality nomenclature).

Parameters: graph : Union[nx.Graph, nx.DiGraph] The graph. No changes will be made to it. bin_directive : Union[int, List[Union[float, int]], numpy.ndarray, str] Is passed directly through to numpy's "histogram" (and thus, "histogram_bin_edges") functions. In short: if an int is provided, we use bin_directive number of equal range bins. If a sequence is provided, these bin edges will be used and can be sized to whatever size you prefer Note that the numpy.ndarray should be ndim=1 and the values should be float or int. num_random_samples : Optional[int] Use num_random_samples for vertex samples to estimate betweeness. num_random_samples should be <= len(graph.nodes). The larger num_random_samples is, the better the approximation. Default is None. normalized : bool If True the betweenness values are normalized by $$2/((n-1)(n-2))$$ for undirected graphs, and $$1/((n-1)(n-2))$$ for directed graphs where n is the number of vertices in the graph. Default is True weight_attribute : Optional[str] If None, all edge weights are considered equal. Otherwise holds the name of the edge attribute used as weight. Default is weight include_endpoints : bool If True include the endpoints in the shortest path counts. Default is False random_seed : Optional[Union[int, random.Random, np.random.RandomState]] Random seed or preconfigured random instance to be used for selecting random samples. Only used if num_random_samples is set. None will generate a new random state. Specifying a random state will provide consistent results between runs. DefinedHistogram A named tuple that contains the histogram and the bin_edges used in the histogram
graspologic.preprocessing.histogram_degree_centrality(graph: Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph], bin_directive: Union[int, List[Union[int, float]], numpy.ndarray, str] = 10) → graspologic.preprocessing.graph_cuts.DefinedHistogram[source]

Generates a histogram of the vertex degree centrality of the provided graph. Histogram function is fundamentally proxied through to numpy's histogram function, and bin selection follows numpy.histogram() processes.

Parameters: graph : Union[nx.Graph, nx.DiGraph] The graph. No changes will be made to it. bin_directive : Union[int, List[Union[float, int]], numpy.ndarray, str] Is passed directly through to numpy's "histogram" (and thus, "histogram_bin_edges") functions. In short: if an int is provided, we use bin_directive number of equal range bins. If a sequence is provided, these bin edges will be used and can be sized to whatever size you prefer Note that the numpy.ndarray should be ndim=1 and the values should be float or int. DefinedHistogram A named tuple that contains the histogram and the bin_edges used in the histogram
graspologic.preprocessing.histogram_edge_weight(graph: Union[networkx.classes.graph.Graph, networkx.classes.digraph.DiGraph], bin_directive: Union[int, List[Union[int, float]], numpy.ndarray, str] = 10, weight_attribute: str = 'weight') → graspologic.preprocessing.graph_cuts.DefinedHistogram[source]

Generates a histogram of the edge weights of the provided graph. Histogram function is fundamentally proxied through to numpy's histogram function, and bin selection follows numpy.histogram() processes.

Parameters: graph : nx.Graph The graph. No changes will be made to it. bin_directive : Union[int, List[Union[float, int]], numpy.ndarray, str] Is passed directly through to numpy's "histogram" (and thus, "histogram_bin_edges") functions. In short: if an int is provided, we use bin_directive number of equal range bins. If a sequence is provided, these bin edges will be used and can be sized to whatever size you prefer Note that the numpy.ndarray should be ndim=1 and the values should be float or int. weight_attribute : str The weight attribute name in the data dictionary. Default is weight. DefinedHistogram A named tuple that contains the histogram and the bin_edges used in the histogram

Notes

Edges without a weight_attribute field will be excluded from this histogram. Enable logging to view any messages about edges without weights.