GRAMSCI

June 18, 2026 · View on GitHub

GRAph Made Statistics for Cosmological Information — fast 2-, 3- and 4-point correlation functions (including the parity-odd 4PCF) of 3D point sets, built on a graph-database representation of the pair graph.

License: GPL v3 arXiv

Baryon acoustic feature in the 3-point correlation function measured with GRAMSCI

GRAMSCI measures the N-point correlation functions of a 3D catalogue (a galaxy survey, a simulation box, …). It builds a KD-tree, stores each point's neighbours within rmax as a compact adjacency graph, and then counts N-point configurations directly from the graph — the design principle that makes it fast and memory-light.

  • 2PCF (with optional μ bins), full 3PCF (all triangles) + equilateral, full 4PCF, and the parity-odd 4PCF.
  • Runs on CPU (OpenMP), any GPU via OpenCL (incl. Apple Silicon), or NVIDIA GPUs via OpenACC — same command-line interface and outputs.
  • A Python API that takes NumPy arrays, plus a command-line tool.

Sabiu, Hoyle, Kim & Li, ApJS 242, 29 (2019), arXiv:1901.00296. Please cite if you use GRAMSCI — see Citation.


Install

Needs a Fortran compiler with OpenMP (gfortran). On macOS: brew install gcc. On macOS the build auto-detects the SDK for the linker.

git clone <repo-url> gramsci && cd gramsci
make                    # builds bin/gramsci (+ bin/domain_decomposition)
pip install -e python   # optional: the Python interface (needs numpy)

Optional GPU builds — identical CLI and output formats:

make gpu    # portable OpenCL  -> bin/gramsci_cl   (Apple Silicon / AMD / Intel; see src_opencl/)
make cuda   # NVIDIA OpenACC   -> bin/gramsci_gpu  (needs nvfortran;            see src_gpu/)

Quickstart — Python

import numpy as np
import gramsci

data = ...     # (N, 3) array of x, y, z in comoving Mpc/h
wd   = ...     # (N,)   per-object weights
rand = ...     # (M, 3) random catalogue covering the same volume
wr   = ...     # (M,)   random weights

# --- 2-point correlation function ---
xi = gramsci.compute_2pcf(data, wd, rand, wr, rmin=1, rmax=150, nbins=30)
#   xi.r, xi.xi, xi.NN, xi.RR

# --- 3-point correlation function (all triangle configurations) ---
z = gramsci.compute_3pcf(data, wd, rand, wr, rmin=20, rmax=140, nbins=10)
#   one row per (r1, r2, r3) triangle:  z.r1, z.r2, z.r3, z.zeta

# --- connected & parity 4PCF ---
q  = gramsci.compute_4pcf(data, wd, rand, wr, rmin=20, rmax=140, nbins=4)               # q.zeta_conn
qp = gramsci.compute_4pcf(data, wd, rand, wr, rmin=20, rmax=140, nbins=4, parity=True)  # qp.zeta_even, qp.zeta_odd

Each call returns a small result object whose attributes are NumPy arrays. Point it at a GPU build with binary="bin/gramsci_cl" or export GRAMSCI_BIN=....

👉 See example/quickstart.ipynb for an end-to-end tutorial: catalogue → measurement → plot.

Quickstart — command line

Input catalogues are plain ASCII, 4 columns x y z weight (comoving Mpc/h):

bin/gramsci -gal galaxies.dat -ran randoms.dat \
    -rmin 20 -rmax 140 -nbins 10 -3pcf -out result.3pcf
FlagMeaning
-gal <file>galaxy/data catalogue (x y z [weight])
-ran <file>random catalogue (same format, optional)
-out <file>output filename
-rmin / -rmaxmin / max radial separation
-nbins <N>number of radial bins
-nmu <N>number of μ bins (anisotropic; default 1 = isotropic)
-loglogarithmic radial binning
-2pcf2-point correlation function
-3pcf / -equi3PCF (all triangles) / equilateral only
-4pcf / -4pcfp4PCF / 4PCF with parity decomposition

For very large catalogues or large rmax, bin/domain_decomposition splits the volume into sub-regions that can be run separately (see below).

Output

Every output file is whitespace-delimited with a one-line header; the bin edges come first, then counts and the estimator. The Python wrapper unpacks them:

ModeKey columns (Python attributes)
-2pcfr, NN, RR, xi
-3pcfr1, r2, r3, NNN, RRR, zeta
-4pcf6 side bins, NNNN, RRRR, zeta, zeta_disc, zeta_conn
-4pcfp… plus zeta_even, NNNN_odd, RRRR_odd, zeta_odd

How it works

GRAMSCI runs in two phases:

  1. Graph construction. A KD-tree is built from the points; for each point all neighbours within rmax are found and stored as an adjacency list. Pairwise distances are binned at this stage, so the graph stores integer bin indices rather than floating-point distances. The KD-tree and coordinates are then freed.
  2. Graph query. All N-point counting is done from the adjacency lists alone, without the original coordinates — a compact, cache-friendly structure optimised for combinatorial enumeration.

For the parity 4PCF, each edge also stores a one-byte "direction pixel" encoding the displacement direction, enabling chirality classification without retaining coordinates.

GPU builds

BuildDirectoryPrecisionHardware
gramscisrc/doubleCPU (OpenMP)
gramsci_clsrc_opencl/singleAny OpenCL 1.2 GPU (Apple Silicon, AMD, Intel)
gramsci_gpusrc_gpu/doubleNVIDIA GPUs (OpenACC / nvfortran)

The GPU builds offload the 3PCF, equilateral, 4PCF and parity-4PCF queries; 2PCF and any anisotropic (-nmu>1) queries run on the CPU. The OpenCL build is single precision (Apple Silicon GPUs have no fp64) and validates against the CPU reference to ~10⁻⁶ relative — for publication-grade fp64 use the CPU or NVIDIA build. See src_opencl/README.md and src_gpu/README.md for details and benchmarks.

Testing

cd tests && python3 run_correlation_tests.py     # CPU regression tests
bash ../src_opencl/validate.sh                   # CPU vs OpenCL agreement (if built)

Large catalogues

domain_decompose.sh data ran Nregion Rmax splits the catalogues into Nregion³ spatial sub-regions, reducing the per-region graph RAM. Each region is run separately and the results combined afterwards. Useful when the graph would otherwise exceed available memory.

Repository layout

src/         CPU build (Fortran + OpenMP)          src_gpu/   NVIDIA OpenACC build
src_opencl/  portable OpenCL GPU build             python/    Python interface (gramsci)
bin/         compiled executables                  example/   sample data, scripts, figures
tests/       regression + benchmark scripts        paper/     the method paper

Citation

If GRAMSCI contributes to your research, please cite:

@article{Sabiu2019_GRAMSCI,
  author  = {Sabiu, Cristiano G. and Hoyle, Ben and Kim, Juhan and Li, Xiao-Dong},
  title   = {Graph Database Solution for Higher-order Spatial Statistics
             in the Era of Big Data},
  journal = {The Astrophysical Journal Supplement Series},
  year    = {2019},
  volume  = {242},
  number  = {2},
  pages   = {29},
  doi     = {10.3847/1538-4365/ab22b5},
  eprint  = {1901.00296},
  archivePrefix = {arXiv},
  primaryClass  = {astro-ph.CO}
}

License

GNU General Public License v3 — see COPYING.

Cristiano Sabiu · csabiu@gmail.com