[](https://www.uni-heidelberg.de)
May 11, 2026 · View on GitHub
SharedMap v1.1

About SharedMap
SharedMap is a parallel shared-memory algorithm for hierarchical process mapping. Process Mapping concerns itself with mapping tasks of a task graph, where weighted vertices represent tasks and weighted edges the amount of communicated data, to the cores of a supercomputer. In hierarchical process mapping the hierarchy of the supercomputer (often islands, racks, nodes and cores) is exploited for more efficient mapping algorithms. Part of the KaHIP organization.
Python Interface: An easy-to-use Python interface for this software is available in CHSZLabLib.
General Process Mapping
Given
- Undirected graph with
- Vertex weights that represent the workload of task
- Edge weights that represent the amount of data to be transferred between task and
- The graph is also representable as a communication matrix
- Hardware topology matrix of cores
- An entry represents the cost of sending one unit of data from core to core
The objective of process mapping is to determine a mapping such that
-
- no core has too much work (the balancing constraint)
-
- is minimized
- is the cost of transferring the data from task to task if task is mapped to core and task is mapped to core .
Hierarchical Process Mapping
In Hierarchical Process Mapping the hardware topology matrix is described via the hierarchy and the distance . This hierarchy specifies that each processor contains PEs, each node contains processors, each rack contains nodes, and so on. The total number of PEs is . Additionally, the sequence describes the communication cost between the different PEs. Two PEs on the same processor have distance , two PEs on the same node but on different processor have distance , two PEs in the same rack but on different nodes have distance , and so forth.
Results
SharedMap offers State-Of-The-Art solution quality among available parallel mapping algorithms. It has better quality while also being slightly faster. See the left figure.
Even in the serial case, it is stronger and faster than the previous best algorithm. See the right figure. For more information on the algorithm, we refer to our work (ACDA25).

Requirements
This project utilizes the KaHIP and Mt-KaHyPar library. The requirements needed for both of these projects carry over to this project.
- A Linux operating system (others have not yet been tested).
- A modern compiler that supports C++17, such as
g++(others have not yet been tested). - The cmake build system (>=3.16).
- The Portable Hardware Locality library.
- The Boost - Program Options library will be automatically downloaded by Mt-KaHyPar (
-DKAHYPAR_DOWNLOAD_BOOST=ON). - The Intel Thread Building Blocks library will be automatically downloaded by Mt-KaHyPar (
-DKAHYPAR_DOWNLOAD_TBB=ON) and SharedMap (-DSHAREDMAP_DOWNLOAD_TBB=ON).
The following command will install (most of) the required dependencies on a Ubuntu machine:
sudo apt-get install libtbb-dev libhwloc-dev libboost-program-options-dev
Installation
Install via Homebrew (Linux)
brew install KaHIP/kahip/sharedmap
Automatic
The script build.sh will automatically install the binary and the library.
It will first install the Mt-KaHyPar library and afterward build the project.
Manually
Alternatively, you can build the project by hand.
First install Mt-KaHyPar into extern/mt_kahypar_local via
mkdir -p extern/mt_kahypar_local
mkdir -p extern/mt-kahypar/build
cd extern/mt-kahypar/build
cmake .. -DCMAKE_POSITION_INDEPENDENT_CODE=ON -DCMAKE_BUILD_TYPE=Release -DKAHYPAR_DOWNLOAD_TBB=ON -DKAHYPAR_DOWNLOAD_BOOST=ON -DKAHYPAR_ENABLE_THREAD_PINNING=OFF -DKAHYPAR_DISABLE_ASSERTIONS=ON -DCMAKE_INSTALL_PREFIX=$(pwd)/../../mt_kahypar_local
make -j install.mtkahypar
then build SharedMap via
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DSHAREDMAP_DOWNLOAD_TBB=ON
cmake --build . --parallel --target SharedMap # the executable
cmake --build . --parallel --target sharedmap # the library
The binary SharedMap and the library libsharedmap.so will be present in the build folder.
The include files for the library are present in the include folder.
Usage
Python Interface
Build and install locally:
pip install .
Use in Python:
import sharedmap as sm
# Define graph in CSR format
v_weights = [1, 1, 1, 1, 1, 1, 1, 1]
adj_ptrs = [0, 3, 6, 8, 12, 16, 19, 21, 22]
adj_weights = [1] * 22
adj = [1, 2, 3, 0, 3, 4, 0, 3, 0, 1, 2, 5, 1, 5, 6, 7, 3, 4, 6, 4, 5, 4]
# Partition
comm_cost, partition = sm.partition_graph(
v_weights, adj_ptrs, adj_weights, adj,
hierarchy=[2, 2], distance=[1, 10],
imbalance=0.03, n_threads=1
)
See python/README.md for detailed Python documentation.
Command Line Interface
Call SharedMap in the build folder:
./build/SharedMap -g <inpath-graph> -m <outpath-parition> -h <hierarchy> -d <distance> -e <imbalance (e.g. 0.03)> -c {fast|eco|strong} -t <# threads> -s {naive|layer|queue|nb_layer} --seed <seed>
Configuration
The available command line arguments and a short description.
You can also use ./build/SharedMap --help for a list of available parameters.
[ -g | --graph ] <inpath-graph> : Filepath to a graph in Metis graph format.
[ -m | --mapping ] <outpath-partition> : Path to the file that will hold the resulting partition. Any existing file will be overwritten.
[ -h | --hierarchy ] <hierarchy> : The hierarchy of the supercomputer in the format a:b:c: ... e.g., 4:8:6 .
[ -d | --distance ] <distance> : The distance of the processors in the format a:b:c: ... e.g., 1:10:100 .
[ -e | --imbalance ] <imbalance> : The maximum allowed imbalance per block e.g., 0.03 allows fo a maximum imbalance of 3%.
[ -c | --config ] <config> : Which partitioning configuration to use. Allowed value are {fast, eco, strong}.
[ -t | --threads ] <# threads> : The number of threads to use.
[ -s | --strategy ] <strategy> : Which thread distribution strategy to use. Allowed value are {naive, layer, queue, nb_layer}.
Optional:
[ --seed ] <seed> : Seed to diversiy partitioning results. If no seed is provided, a random one will be generated.
Example
The graph graphs/big.graph is partitioned on a supercomputer with a hierarchy of 4:8:6 with distances 1:10:100 and an allowed imbalance of 4%.
As the configuration we choose fast with 10 threads and the queue distribution strategy.
As the seed we choose 13.
The resulting partition is stored in results/mapping.txt.
./build/SharedMap -g graphs/big.graph -m results/mapping.txt -h 4:8:6 -d 1:10:100 -e 0.04 -c fast -t 10 -s queue --seed 13
C++ Interface
SharedMap can be used as a C++ library. You can install it via
cmake --build . --parallel --target sharedmap
Via the interface include/libsharedmap.h hierarchical multisection can be used.
Here is a small exmaple:
#include <iostream>
#include "SharedMap/include/libsharedmap.h"
int main() {
/**
* The graph.
* 0 -- 1 -- 4 -- 7
* | \ | | \
* | \ | | \
* 2 -- 3 -- 5 -- 6
*/
int n = 8; // eight vertices
int v_weights[] = {1, 1, 1, 1, 1, 1, 1, 1}; // each vertex has weight 1
// the adjacency represented in CSR format
int adj_ptrs[] = {0, 3, 6, 8, 12, 16, 19, 21, 22};
int adj_weights[] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}; // each edge has weight 1
int adj[] = {1, 2, 3, 0, 3, 4, 0, 3, 0, 1, 2, 5, 1, 5, 6, 7, 3, 4, 6, 4, 5, 4};
// the hierarchy is 2:2 and the distance is 1:10
int hierarchy[] = {2, 2};
int distance[] = {1, 10};
int l = 2;
// imbalance 3%, 1 thread, seed 0
float imbalance = 0.03;
int n_threads = 1;
int seed = 0;
shared_map_strategy_type_t strategy = NB_LAYER; // distribution strategy: Non-Blocking Layer
shared_map_algorithm_type_t parallel_alg = MTKAHYPAR_HIGHEST_QUALITY; // parallel algorithm : Mt-KaHyPar Highest Quality
shared_map_algorithm_type_t serial_alg = KAFFPA_STRONG; // serial algorithm : KaFFPa Strong
// whether to print erors and statistics
bool verbose_error = true;
bool verbose_statistics = true;
// assert the input (optional)
bool assert_passed = shared_map_hierarchical_multisection_assert_input(n, v_weights, adj_ptrs, adj_weights, adj, hierarchy, distance, l, imbalance, n_threads, seed, strategy, parallel_alg, serial_alg, verbose_error);
if (!assert_passed) {
std::cout << "Error while asserting the input!" << std::endl;
exit(EXIT_FAILURE);
}
// output variables
int comm_cost;
int partition[n];
// do the actual hierarchical multisection
shared_map_hierarchical_multisection(n, v_weights, adj_ptrs, adj_weights, adj, hierarchy, distance, l, imbalance, n_threads, seed, strategy, parallel_alg, serial_alg, comm_cost, partition, verbose_statistics);
std::cout << "Communication Cost J(C, D, PI): " << comm_cost << std::endl;
std::cout << "Partition: ";
for (int i = 0; i < n; ++i) { std::cout << partition[i] << " "; }
std::cout << std::endl;
return 0;
}
Bugs, Questions, Comments and Ideas
If any bugs arise, questions occur, comments want to be shared, or ideas discussed, please do not hesitate to contact the current repository owner (henning.woydt@informatik.uni-heidelberg.de) or leave a GitHub Issue or Discussion. Thanks!
Licensing
SharedMap is a free software provided under the MIT License. For more information see the LICENSE file. This algorithm is available to everyone, welcoming all who wish to make use of it. If you use SharedMap in an academic setting please cite (ACDA25).
@inproceedings{SchulzW25,
author = {Schulz, Christian and Woydt, Henning},
title = {{Shared-Memory Hierarchical Process Mapping}},
booktitle = {Proceedings of the 3rd Conference on Applied and Computational Discrete Algorithms (ACDA 2025)},
pages = {18--31},
publisher = {SIAM},
year = {2025},
doi = {10.1137/1.9781611978759.2}
}