Dendros

June 11, 2026 · View on GitHub

Dendros Logo

A Python toolkit for analyzing Galacticus semi-analytic model outputs — both HDF5 model outputs and posterior-sample ("MCMC") chain logs.

Installation

pip install dendros

To also enable pandas and tabulate table output:

pip install 'dendros[pandas,tabulate]'

To enable plotting of Galacticus /analyses results (requires matplotlib):

pip install 'dendros[plot]'

Install the latest development version directly from GitHub:

pip install git+https://github.com/galacticusorg/dendros.git

Quickstart

Opening files

from dendros import open_outputs

# Single file
c = open_outputs("galacticus.hdf5")

# Auto-detect MPI-split outputs (given any one rank's file)
c = open_outputs("galacticus_MPI:0000.hdf5")

# Explicit list of files
c = open_outputs(["rank0.hdf5", "rank1.hdf5"])

# Glob pattern
c = open_outputs("run001/galacticus*.hdf5")

# Lightcone run (different top-level group)
c = open_outputs("lightcone.hdf5", output_root="Lightcone")

Use Collection as a context manager to ensure files are closed:

with open_outputs("galacticus.hdf5") as c:
    ...

Checking completion status

Galacticus writes a statusCompletion attribute when a run finishes. validate_completion raises an error if any file is incomplete:

with open_outputs("galacticus.hdf5") as c:
    c.validate_completion()           # raises RuntimeError if incomplete
    c.validate_completion(mode="warn")    # emit warning instead
    c.validate_completion(mode="ignore")  # do nothing

Listing available outputs

with open_outputs("galacticus.hdf5") as c:
    tbl = c.list_outputs()          # astropy Table by default
    print(tbl)

    # or as a pandas DataFrame:
    df = c.list_outputs(format="pandas")

    # or as a tabulate string:
    df = c.list_outputs(format="tabulate")

Example output:

index  name     time   scale_factor  redshift  output_type
----- ------- -------- ------------ --------- -----------
    1 Output1  13.8        1.0          0.0      snapshot
    2 Output2   6.0        0.5          1.0      snapshot

The output_type column reports the kind of output each group holds — tree, node, snapshot, or lightcone — and is None (a missing value) for older files that predate the outputType attribute.

You can also access the index object directly:

with open_outputs("galacticus.hdf5") as c:
    for meta in c.outputs:
        print(meta.name, meta.redshift)

Listing available properties

with open_outputs("galacticus.hdf5") as c:
    tbl = c.list_properties("Output1")   # by name
    tbl = c.list_properties(1)           # by 1-based integer index
    print(tbl)

Example output:

name         dtype    shape   description          units
---------- ------- -------- -------------------- ------------
haloMass   float64  (1000,) Halo virial mass     Solar masses
stellarMass float64 (1000,) Stellar mass of disk Solar masses
...

The units column shows a human-readable units description (blank for dimensionless datasets).

Reading datasets

By default, datasets that carry a units quantity are returned as astropy.units.Quantity objects, so units travel with the data. Dimensionless datasets are returned as plain numpy arrays. Pass as_quantity=False to get plain numpy arrays for every dataset.

with open_outputs("galacticus.hdf5") as c:
    # List of dataset paths → same strings used as dict keys
    data = c.read("Output1", ["nodeData/basicMass", "nodeData/diskMassStellar"])
    mass = data["nodeData/basicMass"]   # astropy Quantity, in solar masses
    print(mass.to("kg"))                # convert units
    print(mass.value)                   # underlying numpy array

    # Dict → custom labels
    data = c.read(
        "Output1",
        {"Mhalo": "nodeData/basicMass", "Mstar": "nodeData/diskMassStellar"},
    )
    print(data["Mhalo"])

    # Plain numpy arrays, no units
    data = c.read("Output1", ["nodeData/basicMass"], as_quantity=False)

Filtering galaxies

Pass a boolean mask or integer index array as where:

with open_outputs("galacticus.hdf5") as c:
    # First read to build a mask
    masses = c.read("Output1", ["nodeData/basicMass"])["nodeData/basicMass"]
    mask = masses.value > 1e12

    # Then read everything for the selected galaxies only
    data = c.read(
        "Output1",
        {"Mhalo": "nodeData/basicMass", "Mstar": "nodeData/diskMassStellar"},
        where=mask,
    )

h5py-like browsing

with open_outputs("galacticus.hdf5") as c:
    print(c.keys())                        # top-level groups
    grp = c["Outputs/Output1"]
    print(grp.keys())                      # subgroups / datasets
    print(grp.attrs)                       # group attributes
    ds = c["Outputs/Output1/nodeData/basicMass"]
    print(ds.dtype, ds.shape)

Plotting analyses

If a Galacticus run was configured to write reduced analysis results, the HDF5 file will contain a top-level /analyses group with one subgroup per analysis. Dendros can list those analyses and plot each model curve with its observational/target overlay. Requires the [plot] extra.

For MPI runs, the /analyses data is reduced over all ranks and is identical in every rank's file, so dendros reads only the primary file.

with open_outputs("galacticus.hdf5") as c:
    print(c.list_analyses())                     # tabulate available analyses

    figs = c.plot_analyses()                     # one matplotlib Figure per analysis
    figs = c.plot_analyses(name="stellarMassFunction",
                           output_directory="figs",
                           file_format="pdf")    # also save to disk

MPI outputs

When Galacticus runs with MPI, it writes one file per rank with the suffix _MPI:NNNN (e.g. galacticus_MPI:0000.hdf5, galacticus_MPI:0001.hdf5, …). All ranks contain identical metadata groups; galaxy datasets are split across ranks.

open_outputs handles this automatically:

# Any single-rank file → auto-detects all peers
c = open_outputs("galacticus_MPI:0000.hdf5")

# Or pass an explicit list / glob
c = open_outputs("galacticus_MPI:????.hdf5")

c.read(...) transparently concatenates arrays across all ranks along axis 0.

Lightcone outputs

For lightcone runs the top-level group is typically Lightcone rather than Outputs. Pass output_root to override the default:

c = open_outputs("lightcone.hdf5", output_root="Lightcone")

MCMC analysis

Dendros also reads Galacticus posterior-sample ("MCMC") chain logs given the config XML used to drive the run, and provides convergence diagnostics, post-burn analyses, parameter-file emission, and corner plots:

from dendros import open_mcmc

with open_mcmc("mcmcConfig.xml") as run:
    outliers = run.outlier_chains()
    step = run.convergence_step(threshold=1.1, drop_chains=outliers)

    ess = run.effective_sample_size(post_burn=step)
    fit = run.multivariate_normal_fit(post_burn=step, drop_chains=outliers)
    fit.write_reparameterization_config("reparam.xml")

    map_ = run.maximum_posterior(drop_chains=outliers)
    run.write_parameter_files(map_.state, "max_posterior")

    fig = run.corner_plot(post_burn=step, drop_chains=outliers)

Brooks-Gelman corrected Rhat (with the non-parametric R_interval companion), Geweke z-scores, an iterative Grubbs outlier test on chain final states, Sokal-windowed autocorrelation times, effective sample sizes, sliding-window acceptance rates, projection-pursuit PCA, multivariate-normal fits with reparameterization-config emission, posterior sampling, and base-parameter file generation are all supported. Corner plots require the optional extra: pip install 'dendros[mcmc]'. See the MCMC docs page for details.