Dendros

June 11, 2026 · View on GitHub

Dendros Logo

License: GPL v3 PyPI version Documentation

A Python toolkit for analyzing Galacticus semi-analytic model outputs — both HDF5 model outputs and posterior-sample ("MCMC") chain logs.


Installation

pip install dendros

To also enable pandas and tabulate table output:

pip install 'dendros[pandas,tabulate]'

To enable plotting of Galacticus /analyses results (requires matplotlib):

pip install 'dendros[plot]'

Install the latest development version directly from GitHub:

pip install git+https://github.com/galacticusorg/dendros.git

Quickstart

Opening files

from dendros import open_outputs

# Single file
c = open_outputs("galacticus.hdf5")

# Auto-detect MPI-split outputs (given any one rank's file)
c = open_outputs("galacticus_MPI:0000.hdf5")

# Explicit list of files
c = open_outputs(["rank0.hdf5", "rank1.hdf5"])

# Glob pattern
c = open_outputs("run001/galacticus*.hdf5")

# Lightcone run (different top-level group)
c = open_outputs("lightcone.hdf5", output_root="Lightcone")

Use Collection as a context manager to ensure files are closed:

with open_outputs("galacticus.hdf5") as c:
    ...

Checking completion status

Galacticus writes a statusCompletion attribute when a run finishes. validate_completion raises an error if any file is incomplete:

with open_outputs("galacticus.hdf5") as c:
    c.validate_completion()           # raises RuntimeError if incomplete
    c.validate_completion(mode="warn")    # emit warning instead
    c.validate_completion(mode="ignore")  # do nothing

Listing available outputs

with open_outputs("galacticus.hdf5") as c:
    tbl = c.list_outputs()          # astropy Table by default
    print(tbl)

    # or as a pandas DataFrame:
    df = c.list_outputs(format="pandas")

    # or as a tabulate string:
    df = c.list_outputs(format="tabulate")

Example output:

index  name     time   scale_factor  redshift  output_type
----- ------- -------- ------------ --------- -----------
    1 Output1  13.8        1.0          0.0      snapshot
    2 Output2   6.0        0.5          1.0      snapshot

The output_type column reports the kind of output each group holds — tree, node, snapshot, or lightcone — and is None (a missing value) for older files that predate the outputType attribute.

You can also access the index object directly:

with open_outputs("galacticus.hdf5") as c:
    for meta in c.outputs:
        print(meta.name, meta.redshift)

Listing available properties

with open_outputs("galacticus.hdf5") as c:
    tbl = c.list_properties("Output1")   # by name
    tbl = c.list_properties(1)           # by 1-based integer index
    print(tbl)

Example output:

name         dtype    shape   description          units
---------- ------- -------- -------------------- ------------
haloMass   float64  (1000,) Halo virial mass     Solar masses
stellarMass float64 (1000,) Stellar mass of disk Solar masses
...

The units column shows a human-readable units description (blank for dimensionless datasets).

Reading datasets

By default, datasets that carry a units quantity are returned as astropy.units.Quantity objects, so units travel with the data. Dimensionless datasets are returned as plain numpy arrays. Pass as_quantity=False to get plain numpy arrays for every dataset.

with open_outputs("galacticus.hdf5") as c:
    # List of dataset paths → same strings used as dict keys
    data = c.read("Output1", ["nodeData/basicMass", "nodeData/diskMassStellar"])
    mass = data["nodeData/basicMass"]   # astropy Quantity, in solar masses
    print(mass.to("kg"))                # convert units
    print(mass.value)                   # underlying numpy array

    # Dict → custom labels
    data = c.read(
        "Output1",
        {"Mhalo": "nodeData/basicMass", "Mstar": "nodeData/diskMassStellar"},
    )
    print(data["Mhalo"])

    # Plain numpy arrays, no units
    data = c.read("Output1", ["nodeData/basicMass"], as_quantity=False)

Filtering galaxies

Pass a boolean mask or integer index array as where:

with open_outputs("galacticus.hdf5") as c:
    # First read to build a mask
    masses = c.read("Output1", ["nodeData/basicMass"])["nodeData/basicMass"]
    mask = masses.value > 1e12

    # Then read everything for the selected galaxies only
    data = c.read(
        "Output1",
        {"Mhalo": "nodeData/basicMass", "Mstar": "nodeData/diskMassStellar"},
        where=mask,
    )

h5py-like browsing

with open_outputs("galacticus.hdf5") as c:
    print(c.keys())                        # top-level groups
    grp = c["Outputs/Output1"]
    print(grp.keys())                      # subgroups / datasets
    print(grp.attrs)                       # group attributes
    ds = c["Outputs/Output1/nodeData/basicMass"]
    print(ds.dtype, ds.shape)

Plotting analyses

If a Galacticus run was configured to write reduced analysis results, the HDF5 file will contain a top-level /analyses group with one subgroup per analysis. Dendros can list those analyses and plot each model curve with its observational/target overlay. Requires the [plot] extra.

For MPI runs, the /analyses data is reduced over all ranks and is identical in every rank's file, so dendros reads only the primary file.

with open_outputs("galacticus.hdf5") as c:
    print(c.list_analyses())                     # tabulate available analyses

    figs = c.plot_analyses()                     # one matplotlib Figure per analysis
    figs = c.plot_analyses(name="stellarMassFunction",
                           output_directory="figs",
                           file_format="pdf")    # also save to disk

MPI outputs

When Galacticus runs with MPI, it writes one file per rank with the suffix _MPI:NNNN (e.g. galacticus_MPI:0000.hdf5, galacticus_MPI:0001.hdf5, …). All ranks contain identical metadata groups; galaxy datasets are split across ranks.

open_outputs handles this automatically:

# Any single-rank file → auto-detects all peers
c = open_outputs("galacticus_MPI:0000.hdf5")

# Or pass an explicit list / glob
c = open_outputs("galacticus_MPI:????.hdf5")

c.read(...) transparently concatenates arrays across all ranks along axis 0.


Lightcone outputs

For lightcone runs the top-level group is typically Lightcone rather than Outputs. Pass output_root to override the default:

c = open_outputs("lightcone.hdf5", output_root="Lightcone")

MCMC analysis

Dendros also reads Galacticus posterior-sample ("MCMC") chain logs given the config XML used to drive the run, and provides convergence diagnostics, post-burn analyses, parameter-file emission, and corner plots:

from dendros import open_mcmc

with open_mcmc("mcmcConfig.xml") as run:
    outliers = run.outlier_chains()
    step = run.convergence_step(threshold=1.1, drop_chains=outliers)

    ess = run.effective_sample_size(post_burn=step)
    fit = run.multivariate_normal_fit(post_burn=step, drop_chains=outliers)
    fit.write_reparameterization_config("reparam.xml")

    map_ = run.maximum_posterior(drop_chains=outliers)
    run.write_parameter_files(map_.state, "max_posterior")

    fig = run.corner_plot(post_burn=step, drop_chains=outliers)

Brooks-Gelman corrected Rhat (with the non-parametric R_interval companion), Geweke z-scores, an iterative Grubbs outlier test on chain final states, Sokal-windowed autocorrelation times, effective sample sizes, sliding-window acceptance rates, projection-pursuit PCA, multivariate-normal fits with reparameterization-config emission, posterior sampling, and base-parameter file generation are all supported. Corner plots require the optional extra: pip install 'dendros[mcmc]'. See the MCMC docs page for details.


Documentation

Full API reference and more examples are available at dendros.readthedocs.io.


Contributing

See CONTRIBUTING.md for development setup, coding style, and how to propose changes.


License

Dendros is released under the GNU General Public License v3.0 or later.