Gematria code structure
March 20, 2023 ยท View on GitHub
This document describes the structure of the Gematria repository.
Basic structure
The Python code is split into modules according to its function; C++ code
follows the same directory structure. By convention, we put all C++ files to
gematria/{package_name}/, and the Python libraries under
gematria/{package_name}/python.
Package: basic_block
Contains code and data structures related to in-memory representation of basic blocks. While we use protos to store the data in rest, most of the code uses a lightweight data structure that can is easy to share between Python and C++.
Package: granite
Implementation of the GRANITE model and base classes for building models with Graph neural networks. The graph construction from basic block data is implemented in C++ for efficiency reasons.
Notable files:
granite/python/run_granite_model.py: the main module for running the GRANITE model.
Package: io
Input/output utilities.
Package: model
Contains base classes for building models and the necessary support code for training and inference (the implementation of the training loop, an inference loop, and a generic main function for running Gematria models).
Notable files:
model/python/model_base.py: the base class for all Gematria models. Contains most of the model-independent code, like the training and inference loops, cost definition, ...model/python/main_function.py: contains model-independent code needed to launch Gematria models from command-line: definitions of command-line flags, and a generic main() function.
Directory: proto
Protocol buffer definitions used in the project.
Package: sequence
Base classes and implementation of models that treat the basic block as a sequence of instructions. In particular, contains implementation of the Ithemal model and the Ithemal+ model described in the Granite paper.
Notable files:
sequence/python/run_sequence_model.py: the main module for running the Ithemal and Ithemal+ models.
Package: testing
Contains helper classes and functions for testing the model and a tiny data set of basic blocks for testing.
Package: utils
Contains various utilities that do not fit into other packages.