Streaming Graph Partitioning (HeiStream)

March 15, 2026 ยท View on GitHub

Original Repository: https://github.com/KaHIP/HeiStream


Overview

HeiStream is a streaming graph partitioning algorithm designed for graphs that are too large to fit entirely in memory. Nodes and their adjacencies are presented sequentially, and each node is assigned to a block upon arrival (or after a bounded buffer delay).

HeiStream requires only O(n + B) memory, where B is the buffer size, compared to O(n + m) for full in-memory methods. It supports three operational modes:

ModeDescription
FennelDirect one-pass assignment (max_buffer_size = 0 or 1)
BuffCutBuffered assignment with local optimization (max_buffer_size > 1)
RestreamingMultiple passes over the stream for improved quality (num_streams_passes > 1)

CHSZLabLib provides two interfaces: a batch static method and a stateful streaming class.


Decomposition.stream_partition()

Partition a graph using HeiStream when the full graph is available in memory.

Signature

Decomposition.stream_partition(
    g: Graph,
    k: int = 2,
    imbalance: float = 3.0,
    seed: int = 0,
    max_buffer_size: int = 0,
    batch_size: int = 0,
    num_streams_passes: int = 1,
    run_parallel: bool = False,
    suppress_output: bool = True,
) -> StreamPartitionResult

Parameters

ParameterTypeDefaultDescription
gGraphrequiredInput graph (undirected, unweighted). Edge weights are ignored.
kint2Number of partitions (must be >= 2).
imbalancefloat3.0Allowed imbalance in percent (3.0 = 3%).
seedint0Random seed for reproducibility.
max_buffer_sizeint0Buffer size for BuffCut mode. 0 or 1 = direct Fennel (no buffer). Larger values enable priority-buffer mode.
batch_sizeint0MLP batch size for model-based partitioning within the buffer. 0 = HeiStream's default.
num_streams_passesint1Number of streaming passes. More passes improve quality at the cost of runtime.
run_parallelboolFalseUse the parallel 3-thread pipeline (I/O, PQ, partition).
suppress_outputboolTrueSuppress C++ stdout/stderr output.

Returns

StreamPartitionResult

FieldTypeDescription
assignmentnp.ndarray (int32)Partition ID for each node (0-indexed).

Exceptions

ExceptionCondition
ValueErrork < 2 or imbalance < 0.

Example

from chszlablib import Graph, Decomposition

g = Graph.from_metis("large_graph.graph")
result = Decomposition.stream_partition(g, k=8, imbalance=3.0, num_streams_passes=2)
print(f"Partition assignment: {result.assignment}")

HeiStreamPartitioner (Stateful Streaming Class)

For true streaming scenarios where nodes arrive one at a time (e.g., from a network stream or database cursor).

Constructor

HeiStreamPartitioner(
    k: int = 2,
    imbalance: float = 3.0,
    seed: int = 0,
    max_buffer_size: int = 0,
    batch_size: int = 0,
    num_streams_passes: int = 1,
    run_parallel: bool = False,
    suppress_output: bool = True,
)

Methods

MethodDescription
new_node(node, neighbors)Add a node with its neighbor list to the stream. Nodes can be added in any order with non-contiguous IDs.
partition()Run HeiStream on all accumulated nodes and return a StreamPartitionResult.
reset()Clear all added nodes while retaining configuration.

Example

from chszlablib import HeiStreamPartitioner

hs = HeiStreamPartitioner(k=4, imbalance=3.0, max_buffer_size=1000)
hs.new_node(0, [1, 2])
hs.new_node(1, [0, 3])
hs.new_node(2, [0])
hs.new_node(3, [1])
result = hs.partition()
print(result.assignment)  # array of partition IDs

Performance Disclaimer

This Python interface wraps the HeiStream C++ library via pybind11. While convenient for prototyping and integration into Python workflows, there is inherent overhead from the Python/C++ boundary and data conversion. For maximum performance on large-scale streaming instances, use the original C++ implementation directly from the HeiStream repository.


References

@article{DBLP:journals/jea/FarajS22,
  author  = {Marcelo Fonseca Faraj and Christian Schulz},
  title   = {Buffered Streaming Graph Partitioning},
  journal = {{ACM} J. Exp. Algorithmics},
  volume  = {27},
  pages   = {1.10:1--1.10:26},
  year    = {2022},
  doi     = {10.1145/3546911}
}