Recording Data for Thicket, Hatchet, and TreeScape
April 30, 2025 ยท View on GitHub
Caliper is the primary way to collect data for the Python performance analysis frameworks Thicket, Hatchet, and TreeScape:
- Thicket is a Python-based toolkit for Exploratory Data Analysis (EDA) of parallel performance data on supercomputers.
- Hatchet is a Python framework for analyzing call-path profiles and serves as the data import backend for Thicket.
- TreeScape (coming soon) is a Python toolkit for visually analyzing large sets of call-path profiles, e.g. for performance regression testing or scaling studies.
Preparing the code
Start by identifying code regions of interest and annotate them with Caliper region markers as outlined in Section 1. In addition, add run metadata annotations with Adiak or Caliper as outlined in Section 2. Metadata is especially important for distinguishing and categorizing data in Thicket and TreeScape, which operate on large sets of runs.
Recording profiles with the spot recipe
Thicket, Hatchet and TreeScape import profiles in Caliper's .cali file format.
While the .cali format supports a wide spectrum of performance experiments, the
tools discussed here expect aggregate performance profiles. Caliper provides several
built-in recipes to create such profiles, primarily spot and hatchet-region-profile:
- The
spotrecipe is the primary configuration for Thicket and TreeScape. It records summary performance statistics per Caliper region across MPI ranks. - The
hatchet-region-profilerecipe records performance data for each MPI rank.
In the following, we focus on the spot recipe. Spot always records the
minimum, maximum, average, and total inclusive time in seconds in each Caliper
region across MPI ranks (including for non-MPI programs, where all values are
identical). "Inclusive" means the reported time includes the time spent in any
nested sub-regions of a given Caliper region. Usually, spot also records the
exclusive times (i.e., the time spent in a region excluding the time spent in
nested sub-regions) and number of region invocations, again providing
minimum, average, maximum, and total values across MPI ranks. Additionally,
a variety of optional features can be enabled.
Run cali-query --help spot for a full list of options.
The spot recipe can be activated with the CALI_CONFIG environment variable or
through the ConfigManager API. Spot produces one .cali file per program
run with the aggregated performance data from all participating MPI ranks. We
recommend placing all .cali files belonging to the same experiment in a single
directory. The outdir option can be helpful here. By default, spot generates
a unique file name that starts with the date and time of the run. Alternatively,
a custom file name can be provided in the output option. Make sure to use
unique file names for each run, as existing files with the same name will be
overwritten without warning.
Here is a simple example for recording profiles with the spot recipe:
$ CALI_CONFIG=spot,outdir=thicket_experiment basic_example
This will place a .cali file in the thicket_experiment directory. We can
use cali-query -T to view the file contents on the command line:
$ cali-query -T thicket_experiment/250429-105735_137467_5d56yY5hzYoT.cali
Path Min time/rank Max time/rank Avg time/rank Total time [...]
main 0.000592 0.000592 0.000592 0.000592
main loop 0.000006 0.000006 0.000006 0.000006
bar 0.000001 0.000001 0.000001 0.000001
foo 0.000002 0.000002 0.000002 0.000002
setup 0.000573 0.000573 0.000573 0.000573
Loading Caliper data in Thicket
In Thicket, use thicket.Thicket.from_caliperreader() to create a Thicket
object from a list of Caliper files:
import glob
import thicket as tt
th = tt.Thicket.from_caliperreader(glob.glob("thicket_experiment/*.cali"))
# Create new metric column with time in microseconds
th.dataframe["Total time (usec)"] = 1e6*th.dataframe["Total time"]
# Print tree with new usec metric
print(th.tree(metric_column="Total time (usec)"))
Example output:

For more information, refer to the Thicket documentation.