Partition¶
Modularity and Component Modularity¶
-
graspologic.partition.
modularity
(graph: networkx.classes.graph.Graph, partitions: Dict[Any, int], weight_attribute: str = 'weight', resolution: float = 1.0) → float[source]¶ Given an undirected graph and a dictionary of vertices to community ids, calculate the modularity.
Parameters: graph : nx.Graph
An undirected graph
partitions : Dict[Any, int]
A dictionary representing a community partitioning scheme with the keys being the vertex and the value being a community id.
weight_attribute : str
The edge data attribute on the graph that contains a float weight for the edge.
resolution : float
The resolution to use when calculating the modularity.
Returns: Dict[int, float]
A dictionary of the community id to the modularity component of that community
Raises: TypeError
If
graph
is not a networkx Graph or Ifpartitions
is not a dictionary or Ifresolution
is not a floatValueError
If
graph
is unweighted Ifgraph
is directed Ifgraph
is a multigraphReferences
[1] https://en.wikipedia.org/wiki/Modularity_(networks)
-
graspologic.partition.
modularity_components
(graph: networkx.classes.graph.Graph, partitions: Dict[Any, int], weight_attribute: str = 'weight', resolution: float = 1.0) → Dict[int, float][source]¶ Given an undirected, weighted graph and a community partition dictionary, calculates a modularity quantum for each community ID. The sum of these quanta is the modularity of the graph and partitions provided.
Parameters: graph : nx.Graph
An undirected graph
partitions : Dict[Any, int]
A dictionary representing a community partitioning scheme with the keys being the vertex and the value being a community id.
weight_attribute : str
The edge data attribute on the graph that contains a float weight for the edge.
resolution : float
The resolution to use when calculating the modularity.
Returns: Dict[int, float]
A dictionary of the community id to the modularity component of that community
Raises: TypeError
If
graph
is not a networkx Graph or Ifpartitions
is not a dictionary or Ifresolution
is not a floatValueError
If
graph
is unweighted Ifgraph
is directed Ifgraph
is a multigraph
Leiden and Hierarchical Leiden¶
-
graspologic.partition.
leiden
(graph: Union[List[Tuple[Any, Any, Union[float, int]]], networkx.classes.graph.Graph, numpy.ndarray, scipy.sparse.csr.csr_matrix], starting_communities: Union[Dict[str, int], NoneType] = None, extra_forced_iterations: int = 0, resolution: float = 1.0, randomness: float = 0.001, use_modularity: bool = True, random_seed: Union[int, NoneType] = None, weight_attribute: str = 'weight', is_weighted: Union[bool, NoneType] = None, weight_default: float = 1.0, check_directed: bool = True) → Dict[str, int][source]¶ Leiden is a global network partitioning algorithm. Given a graph, it will iterate through the network node by node, and test for an improvement in our quality maximization function by speculatively joining partitions of each neighboring node.
This process continues until no moves are made that increases the partitioning quality.
Parameters: graph : Union[List[Tuple[Any, Any, Union[int, float]]], nx.Graph, np.ndarray, scipy.sparse.csr.csr_matrix]
A graph representation, whether a weighted edge list referencing an undirected graph, an undirected networkx graph, or an undirected adjacency matrix in either numpy.ndarray or scipy.sparse.csr.csr_matrix form.
starting_communities : Optional[Dict[str, int]]
Default is
None
. An optional community mapping dictionary that contains the string representation of the node and the community it belongs to. Note this map must contain precisely the same nodes as the graph and every node must have a positive community id. This mapping forms the starting point of Leiden clustering and can be very useful in saving the state of a previous partition schema from a previous graph and then adjusting the graph based on new temporal data (additions, removals, weight changes, connectivity changes, etc). New nodes should get their own unique community positive integer, but the original partition can be very useful to speed up future runs of leiden. If no community map is provided, the default behavior is to create a node community identity map, where every node is in their own community.extra_forced_iterations : int
Default is
0
. Leiden will run until a maximum quality score has been found for the node clustering and no nodes are moved to a new cluster in another iteration. As there is an element of randomness to the Leiden algorithm, it is sometimes useful to setextra_forced_iterations
to a number larger than 0 where the entire process is forced to attempt further refinement.resolution : float
Default is
1.0
. Higher resolution values lead to more communities and lower resolution values leads to fewer communities. Must be greater than 0.randomness : float
Default is
0.001
. The larger the randomness value, the more exploration of the partition space is possible. This is a major difference from the Louvain algorithm, which is purely greedy in the partition exploration.use_modularity : bool
Default is
True
. IfFalse
, will use a Constant Potts Model (CPM).random_seed : Optional[int]
Default is
None
. Can provide an optional seed to the PRNG used in Leiden for deterministic output.weight_attribute : str
Default is
weight
. Only used when creating a weighed edge list of tuples when the source graph is a networkx graph. This attribute corresponds to the edge data dict key.is_weighted : Optional[bool]
Default is
None
. Only used when creating a weighted edge list of tuples when the source graph is an adjacency matrix. Thegraspologic.utils.to_weighted_edge_list()
function will scan these matrices and attempt to determine whether it is weighted or not. This flag can short circuit this test and the values in the adjacency matrix will be treated as weights.weight_default : float
Default is
1.0
. If the graph is a networkx graph and the graph does not have a fully weighted sequence of edges, this default will be used. If the adjacency matrix is found or specified to be unweighted, this weight_default will be used for every edge.check_directed : bool
Default is
True
. If the graph is an adjacency matrix, we will attempt to ascertain whether it is directed or undirected. As our leiden implementation is only known to work with an undirected graph, this function will raise an error if it is found to be a directed graph. If you know it is undirected and wish to avoid this scan, you can set this value toFalse
and only the lower triangle of the adjacency matrix will be used to generate the weighted edge list.Returns: Dict[str, int]
The results of running leiden over the provided graph, a dictionary containing mappings of node -> community id. The keys in the dictionary are the string representations of the nodes.
Raises: - ValueError
- TypeError
Notes
This function is implemented in the graspologic-native Python module, a module written in Rust for Python.
References
[1] Traag, V.A.; Waltman, L.; Van, Eck N.J. "From Louvain to Leiden: guaranteeing well-connected communities", Scientific Reports, Vol. 9, 2019 [2] https://github.com/microsoft/graspologic-native
-
class
graspologic.partition.
HierarchicalCluster
(node, cluster, parent_cluster, level, is_final_cluster)[source]¶ Create new instance of HierarchicalCluster(node, cluster, parent_cluster, level, is_final_cluster)
-
node
¶ Alias for field number 0
-
cluster
¶ Alias for field number 1
-
parent_cluster
¶ Alias for field number 2
-
level
¶ Alias for field number 3
-
is_final_cluster
¶ Alias for field number 4
-
count
(self, value, /)¶ Return number of occurrences of value.
-
index
(self, value, start=0, stop=9223372036854775807, /)¶ Return first index of value.
Raises ValueError if the value is not present.
-
-
graspologic.partition.
hierarchical_leiden
(graph: Union[List[Tuple[str, str, float]], networkx.classes.graph.Graph], max_cluster_size: int = 1000, starting_communities: Union[Dict[str, int], NoneType] = None, extra_forced_iterations: int = 0, resolution: float = 1.0, randomness: float = 0.001, use_modularity: bool = True, random_seed: Union[int, NoneType] = None, weight_attribute: str = 'weight', is_weighted: Union[bool, NoneType] = None, weight_default: float = 1.0, check_directed: bool = True) → List[graspologic.partition.leiden.HierarchicalCluster][source]¶ Leiden is a global network partitioning algorithm. Given a graph, it will iterate through the network node by node, and test for an improvement in our quality maximization function by speculatively joining partitions of each neighboring node.
This process continues until no moves are made that increases the partitioning quality.
Unlike the function
graspologic.partition.leiden()
, this function does not stop after maximization has been achieved. On some large graphs, it's useful to identify particularly large communities whose membership counts exceedmax_cluster_size
and induce a subnetwork solely out of that community. This subnetwork is then treated as a wholly separate entity, leiden is run over it, and the new, smaller communities are then mapped into the original community map space.The results also differ substantially; the returned List[HierarchicalCluster] is more of a log of state at each level. All HierarchicalClusters at level 0 should be considered to be the results of running
graspologic.partition.leiden()
. Every community whose membership is greater thanmax_cluster_size
will then also have entries where level == 1, and so on until no communities are greater in population thanmax_cluster_size
OR we are unable to break them down any further.Once a node's membership registration in a community cannot be changed any further, it is marked with the flag
graspologic.partition.HierarchicalCluster.is_final_cluster = 1
.Parameters: graph : Union[List[Tuple[Any, Any, Union[int, float]]], nx.Graph, np.ndarray, scipy.sparse.csr.csr_matrix]
A graph representation, whether a weighted edge list referencing an undirected graph, an undirected networkx graph, or an undirected adjacency matrix in either numpy.ndarray or scipy.sparse.csr.csr_matrix form.
max_cluster_size : int
Default is
1000
. Any partition or cluster with membership >=max_cluster_size
will be isolated into a subnetwork. This subnetwork will be used for a new leiden global partition mapping, which will then be remapped back into the global space after completion. Once all clusters with membership >=max_cluster_size
have been completed, the level increases and the partition scheme is scanned again for any new clusters with membership >=max_cluster_size
and the process continues until every cluster's membership is <max_cluster_size
or if they cannot be broken into more than one new community.starting_communities : Optional[Dict[str, int]]
Default is
None
. An optional community mapping dictionary that contains the string representation of the node and the community it belongs to. Note this map must contain precisely the same nodes as the graph and every node must have a positive community id. This mapping forms the starting point of Leiden clustering and can be very useful in saving the state of a previous partition schema from a previous graph and then adjusting the graph based on new temporal data (additions, removals, weight changes, connectivity changes, etc). New nodes should get their own unique community positive integer, but the original partition can be very useful to speed up future runs of leiden. If no community map is provided, the default behavior is to create a node community identity map, where every node is in their own community.extra_forced_iterations : int
Default is
0
. Leiden will run until a maximum quality score has been found for the node clustering and no nodes are moved to a new cluster in another iteration. As there is an element of randomness to the Leiden algorithm, it is sometimes useful to setextra_forced_iterations
to a number larger than 0 where the entire process is forced to attempt further refinement.resolution : float
Default is
1.0
. Higher resolution values lead to more communities and lower resolution values leads to fewer communities. Must be greater than 0.randomness : float
Default is
0.001
. The larger the randomness value, the more exploration of the partition space is possible. This is a major difference from the Louvain algorithm, which is purely greedy in the partition exploration.use_modularity : bool
Default is
True
. IfFalse
, will use a Constant Potts Model (CPM).random_seed : Optional[int]
Default is
None
. Can provide an optional seed to the PRNG used in Leiden for deterministic output.weight_attribute : str
Default is
weight
. Only used when creating a weighed edge list of tuples when the source graph is a networkx graph. This attribute corresponds to the edge data dict key.is_weighted : Optional[bool]
Default is
None
. Only used when creating a weighted edge list of tuples when the source graph is an adjacency matrix. Thegraspologic.utils.to_weighted_edge_list()
function will scan these matrices and attempt to determine whether it is weighted or not. This flag can short circuit this test and the values in the adjacency matrix will be treated as weights.weight_default : float
Default is
1.0
. If the graph is a networkx graph and the graph does not have a fully weighted sequence of edges, this default will be used. If the adjacency matrix is found or specified to be unweighted, this weight_default will be used for every edge.check_directed : bool
Default is
True
. If the graph is an adjacency matrix, we will attempt to ascertain whether it is directed or undirected. As our leiden implementation is only known to work with an undirected graph, this function will raise an error if it is found to be a directed graph. If you know it is undirected and wish to avoid this scan, you can set this value toFalse
and only the lower triangle of the adjacency matrix will be used to generate the weighted edge list.Returns: List[HierarchicalCluster]
The results of running hierarchical leiden over the provided graph, a list of HierarchicalClusters identifying the state of every node and cluster at each level. The function
graspologic.partition.HierarchicalCluster.final_hierarchical_clustering()
can be used to create a dictionary mapping of node -> cluster IDRaises: - ValueError
- TypeError
Notes
This function is implemented in the graspologic-native Python module, a module written in Rust for Python.
References
[1] Traag, V.A.; Waltman, L.; Van, Eck N.J. "From Louvain to Leiden: guaranteeing well-connected communities",Scientific Reports, Vol. 9, 2019 [2] https://github.com/microsoft/graspologic-native