Correlation Clustering (SCC)
March 15, 2026 ยท View on GitHub
Original Repository: https://github.com/ScalableCorrelationClustering/ScalableCorrelationClustering
Overview
Correlation clustering operates on graphs with signed edge weights: positive weights indicate similarity, negative weights indicate dissimilarity. The goal is to partition the vertex set into an arbitrary number of clusters that minimizes the total disagreements -- positive edges that cross cluster boundaries plus negative edges that remain within the same cluster.
Unlike standard clustering, the number of clusters is not fixed a priori but determined automatically by the optimization. CHSZLabLib provides two SCC variants:
| Method | Description |
|---|---|
correlation_clustering() | Multilevel label propagation (fast) |
evolutionary_correlation_clustering() | Memetic evolutionary algorithm (higher quality, longer runtime) |
Decomposition.correlation_clustering()
Signature
Decomposition.correlation_clustering(
g: Graph,
seed: int = 0,
time_limit: float = 0,
) -> CorrelationClusteringResult
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
g | Graph | required | Input graph with signed edge weights. Positive = similarity, negative = dissimilarity. |
seed | int | 0 | Random seed for reproducibility. |
time_limit | float | 0 | Time limit in seconds (0 = no limit). |
Returns
CorrelationClusteringResult
| Field | Type | Description |
|---|---|---|
edge_cut | int | Number of disagreements. |
num_clusters | int | Number of clusters found. |
assignment | np.ndarray (int32) | Cluster ID for each node (0-indexed). |
Example
from chszlablib import Graph, Decomposition
g = Graph(num_nodes=4)
g.add_edge(0, 1, weight=1) # similar
g.add_edge(1, 2, weight=-1) # dissimilar
g.add_edge(2, 3, weight=1) # similar
g.add_edge(0, 3, weight=-1) # dissimilar
g.finalize()
result = Decomposition.correlation_clustering(g, seed=42)
print(f"Clusters: {result.num_clusters}, disagreements: {result.edge_cut}")
Decomposition.evolutionary_correlation_clustering()
A higher-quality variant that uses a population-based memetic evolutionary algorithm with recombination and multilevel local search.
Signature
Decomposition.evolutionary_correlation_clustering(
g: Graph,
seed: int = 0,
time_limit: float = 5.0,
) -> CorrelationClusteringResult
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
g | Graph | required | Input graph with signed edge weights. |
seed | int | 0 | Random seed for reproducibility. |
time_limit | float | 5.0 | Time budget in seconds for the evolutionary search. |
Returns
CorrelationClusteringResult (same fields as above).
Example
from chszlablib import Graph, Decomposition
g = Graph.from_metis("signed_graph.graph")
# Fast version
fast = Decomposition.correlation_clustering(g, seed=42)
print(f"Fast: {fast.num_clusters} clusters, {fast.edge_cut} disagreements")
# Evolutionary refinement
evo = Decomposition.evolutionary_correlation_clustering(g, time_limit=30.0, seed=42)
print(f"Evo: {evo.num_clusters} clusters, {evo.edge_cut} disagreements")
Performance Disclaimer
This Python interface wraps the SCC C++ library via pybind11. While convenient for prototyping and integration into Python workflows, there is inherent overhead from the Python/C++ boundary and data conversion. For maximum performance on large-scale instances, use the original C++ implementation directly from the SCC repository.
References
@inproceedings{DBLP:conf/alenex/HausbergerF025,
author = {Felix Hausberger and Marcelo Fonseca Faraj and Christian Schulz},
title = {Scalable Multilevel and Memetic Signed Graph Clustering},
booktitle = {Proceedings of the 27th Symposium on Algorithm Engineering and Experiments,
{ALENEX} 2025},
pages = {81--94},
publisher = {{SIAM}},
year = {2025},
doi = {10.1137/1.9781611978339.7}
}