[](https://www.uni-heidelberg.de)

April 15, 2026 ยท View on GitHub

CluStRE v1.0 License: MIT C++ CMake GitHub Release Homebrew Linux macOS GitHub Stars GitHub Issues Last Commit arXiv Heidelberg University

CluStRE Logo

CluStRE (Streaming Graph Clustering with Multi-Stage Refinement) is a streaming graph clustering algorithm that achieves near in-memory quality while using a fraction of the memory. Part of the KaHIP organization.

Python Interface: An easy-to-use Python interface for this software is available in CHSZLabLib.

What it solvesGet good graph clusterings super fast, or cluster graphs too large to fit in memory
ObjectiveMaximize modularity via streaming
Key results89.8% better quality, 2.6x faster, <2/3 memory vs. state-of-the-art streaming; achieves >96% of in-memory quality (Louvain)
AlgorithmStreaming assignment + quotient graph optimization + restreaming with local search
InterfacesCLI
RequiresMPI, C++20 compiler, CMake 3.24+

Quick Start

Install

MethodCommand
Homebrew (macOS/Linux)brew install KaHIP/kahip/clustre
Build from source./compile.sh

Run

# Fast streaming clustering (light mode)
./deploy/clustre mygraph.graph --one_pass_algorithm=modularity --mode=light

# Best quality (strong mode: streaming + evolutionary + local search refinement)
./deploy/clustre mygraph.graph --one_pass_algorithm=modularity --ext_clustering_algorithm=VieClus --mode=strong

Modes

CluStRE offers four modes that trade off speed, memory, and quality:

ModePhasesUse case
lightOne-pass streamingFastest, lowest memory
light_plusStreaming + restream with local searchGood balance of speed and quality
evoStreaming + memetic quotient graph clusteringHigh quality without restreaming
strongStreaming + memetic clustering + restream with local searchBest quality, highest resource usage

Command Line Usage

./deploy/clustre <graph-file> [options]

Required options

OptionDescription
<graph-file>Path to graph in METIS format (see Graph Format)
--one_pass_algorithm=modularityObjective function for streaming (modularity is built-in)
--mode=<mode>Clustering mode: light, light_plus, evo, strong

Common options

OptionDefaultDescription
--ext_clustering_algorithm=VieClusVieClusIn-memory algorithm for quotient graph (used in evo and strong modes)
--ext_algorithm_time=<seconds>300Time limit for the in-memory clustering on the quotient graph
--output_path=<directory>current dirDirectory for output files
--suppress_file_outputoffSuppress writing the clustering assignment file
--evaluateoffCompute and store solution quality metrics
--cluster_fraction=<0..1>~5M clustersCap max clusters to fraction * n
--cut_off=<0..1>0.05Stop local search when gain < cutoff * estimated_modularity
--ls_time_limit=<seconds>600Time limit for local search phase
--helpPrint all available options

Output files

FileDescription
<graph>_<mode>.txtClustering assignment (one cluster ID per line, 0-indexed)
<graph>_<mode>.binFlatBuffer binary with quality metrics, running time, memory usage

Example workflow

# 1. Cluster a graph in strong mode with 2-minute VieClus time limit
./deploy/clustre examples/rgg_n_2_15_s0.graph \
    --one_pass_algorithm=modularity \
    --ext_clustering_algorithm=VieClus \
    --ext_algorithm_time=120 \
    --mode=strong \
    --evaluate

# 2. Result files:
#    rgg_n_2_15_s0_strong.txt   (cluster assignments)
#    rgg_n_2_15_s0_strong.bin   (quality metrics)

Graph Format

CluStRE uses the METIS graph format, the same format used by KaHIP, Metis, Chaco, and the 10th DIMACS Implementation Challenge.

Input format

A plain text file with n + 1 lines (excluding comments). Lines starting with % are comments.

Header line:

n m [f]
  • n = number of vertices, m = number of undirected edges
  • f = format flag (optional): 0 = unweighted, 1 = edge weights, 10 = node weights, 11 = both

Vertex lines (one per vertex):

v1 [w1] v2 [w2] ...

where v_i are neighbor IDs (1-indexed) and w_i are edge weights (if f=1 or f=11).

Example (4 vertices, 5 edges, unweighted):

4 5
2 3
1 3 4
1 2 4
2 3

Requirements

  • Undirected graph: every edge must appear in both adjacency lists
  • No self-loops or parallel edges
  • Vertex IDs are 1-indexed in the file
  • Edge weights must be strictly positive; vertex weights must be non-negative

64-bit support

CluStRE supports 64-bit vertex and edge IDs by default. To disable, set 64BITMODE and 64BITVERTEXMODE to OFF in CMakeLists.txt and rebuild.

For a full description of the METIS format, see the KaHIP manual.


How It Works

CluStRE processes a graph in a streaming fashion, requiring only a fraction of the memory needed by in-memory approaches:

  1. Streaming phase: Vertices are streamed one by one and assigned to clusters that maximize modularity gain. A quotient graph (cluster-level summary) is built incrementally.
  2. Quotient graph optimization (evo/strong modes): The quotient graph is clustered using VieClus, a memetic algorithm that finds high-quality modularity-optimizing clusterings.
  3. Restreaming with local search (light_plus/strong modes): The graph is re-streamed and vertices are reassigned using local search, refining cluster boundaries based on the improved quotient graph clustering.

This combination achieves over 96% of the quality of in-memory algorithms like Louvain while handling graphs that do not fit in memory.


Building from Source

Requirements

  • C++20 compiler (g++ 10+)
  • CMake 3.24+
  • MPI (OpenMPI or Intel MPI)

All other dependencies (VieClus, FlatBuffers, STXXL, KaGen, abseil, robin-hood-hashing) are fetched automatically during the build.

git clone https://github.com/KaHIP/CluStRE.git
cd CluStRE
./compile.sh

The binary is placed in ./deploy/clustre.

Alternatively, use the standard CMake process:

mkdir build && cd build
cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_CXX_FLAGS="-w" ../
make -j$(nproc)

Data References

Graphs used in our experimental evaluation were sourced from:

SourceURL
SNAP Datasethttps://snap.stanford.edu/data/
10th DIMACS Implementation Challengehttps://sites.cc.gatech.edu/dimacs10/index.shtml
Laboratory for Web Algorithmicshttps://law.di.unimi.it/
Network Repositoryhttps://networkrepository.com/
Torch Geometrichttps://pytorch-geometric.readthedocs.io/

Graphs were converted to METIS format with parallel edges, self-loops, and directions removed, and unit weights assigned to all nodes and edges.


ProjectDescription
VieClusState-of-the-art memetic algorithm for highest modularity values
KaHIPKarlsruhe High Quality Graph Partitioning (flagship framework)
HeiStreamBuffered streaming graph and edge partitioner
KaHyParKarlsruhe Hypergraph Partitioning

Licence

The program is licenced under MIT licence. If you publish results using our algorithms, please acknowledge our work by quoting the following paper:

@inproceedings{ChhabraPS25,
  author    = {Chhabra, Adil and Peretz, Shai Dorian and Schulz, Christian},
  title     = {{CluStRE: Streaming Graph Clustering with Multi-Stage Refinement}},
  booktitle = {23rd International Symposium on Experimental Algorithms (SEA 2025)},
  series    = {LIPIcs},
  volume    = {338},
  pages     = {11:1--11:20},
  publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"{u}}r Informatik},
  year      = {2025},
  doi       = {10.4230/LIPIcs.SEA.2025.11}
}