MacSim
March 22, 2026 · View on GitHub
Introduction
MacSim is a trace-based cycle-level GPGPU simulator developed by HPArch at Georgia Institute of Technology.
- It simulates x86, ARM64, NVIDIA PTX and Intel GEN GPU instructions and can be configured as either a trace driven or execution-driven cycle level simulator. It models detailed micro-architectural behaviors, including pipeline stages, multi-threading, and memory systems.
- MacSim is capable of simulating a variety of architectures, such as Intel's Sandy Bridge, Skylake (both CPUs and GPUs) and NVIDIA's Fermi. It can simulate homogeneous ISA multicore simulations as well as heterogeneous ISA multicore simulations. It also supports asymmetric multicore configurations (small cores + medium cores + big cores) and SMT or MT architectures as well.
- Currently interconnection network model (based on IRIS) and power model (based on McPAT) are connected.
- MacSim is also one of the components of SST, so multiple MacSim simulators can run concurrently.
- The project has been supported by Intel, NSF, Sandia National Lab.
Table of Contents
- Note
- Intel GEN GPU Architecture
- Documentation
- Installation
- Quick Start
- Downloading Traces
- Generating Your Own Traces
- Known Bugs
- People
- Q & A
- Tutorial
- SST+MacSim
Note
-
If you're interested in the Intel's integrated GPU model in MacSim, please refer to intel_gpu branch.
-
We've developed a power model for GPU architecture using McPAT. Please refer to the following paper for more detailed information. Power Modeling for GPU Architecture using McPAT Modeling for GPU Architecture using McPAT.pdf) by Jieun Lim, Nagesh B. Lakshminarayana, Hyesoon Kim, William Song, Sudhakar Yalamanchili, Wonyong Sung, from Transactions on Design Automation of Electronic Systems (TODAES) Vol. 19, No. 3.
-
We've characterised the performance of Intel's integrated GPUs using MacSim. Please refer to the following paper for more detailed information. Performance Characterisation and Simulation of Intel's Integrated GPU Architecture (ISPASS'18)
Intel GEN GPU Architecture
- Intel GEN9 GPU Architecture:

Documentation
Please see MacSim documentation file for more detailed descriptions.
Installation
Prerequisites
-
zlib (development library)
# Ubuntu/Debian sudo apt install zlib1g-dev # RHEL/CentOS/Fedora sudo dnf install zlib-devel -
Python >= 3.11 and SCons (build tool)
uv venv uv pip install sconsOptionally, activate the virtual environment so you can omit
uv run:source .venv/bin/activate
Clone and Build
git clone https://github.com/gthparch/macsim.git --recursive
cd macsim
./build.py --ramulator -j 32
# Or without activating the virtual environment:
uv run ./build.py --ramulator -j 32
For more build options, see ./build.py --help.
Quick Start
This section walks you through downloading a trace, setting up the simulation, and running it.
1. Download a Sample Trace
uv pip install gdown
gdown -O macsim_traces.tar.gz 1rpAgIMGJnrnXwDSiaM3S7hBysFoVhyO1
tar -xzf macsim_traces.tar.gz
rm macsim_traces.tar.gz
This will extract sample traces from the Rodinia benchmark suite into a macsim_traces/ directory.
2. Set Up a Run Directory
You need three files in the same directory to run a simulation:
macsim— the binary executableparams.in— GPU configurationtrace_file_list— list of paths to GPU traces
Copy them from the build output:
mkdir run
cp bin/macsim bin/params.in bin/trace_file_list run/
cd run
3. Set Up the Trace Path
Edit trace_file_list. The first line is the number of traces, and the second line is the path to the trace:
1
/absolute/path/to/macsim_traces/hotspot/r512h2i2/kernel_config.txt
4. Run
./macsim
Simulation results will appear in the current directory. For example, check general.stat.out for the total cycle count:
grep CYC_COUNT_TOT general.stat.out
Note: The parameter file must be named
params.in. The macsim binary looks for this exact filename in the current directory.
5. Run All Benchmarks
To run all downloaded traces and verify the build:
mkdir -p test_run && cp bin/macsim bin/params.in test_run/
cd test_run
for trace in ../macsim_traces/*/; do
name=$(basename $trace)
subdir=$(ls -d $trace/*/kernel_config.txt 2>/dev/null || ls $trace/kernel_config.txt 2>/dev/null)
[ -z "$subdir" ] && continue
printf "1\n$(realpath $subdir)\n" > trace_file_list
result=$(timeout 120 ./macsim 2>&1 | grep "finalize" | head -1)
echo "$name: $result"
done
Downloading Traces
Publicly Available Traces
| Dataset | Download |
|---|---|
| Rodinia | Download |
| PyTorch | Download |
| YOLOPv2 | Download |
| GPT2 | Download |
| GEMMA | Download |
Generating Your Own Traces
Warning: The trace generation tool is experimental — use at your own risk.
To generate traces for your own CUDA workloads, use the MacSim Tracer.
Simply prepend CUDA_INJECTION64_PATH to your original command. For example:
CUDA_INJECTION64_PATH=/path/to/main.so python3 your_cuda_program.py
Available environment variables:
| Variable | Description | Default |
|---|---|---|
TRACE_PATH | Path to save trace files | ./ |
KERNEL_BEGIN | First kernel to trace | 0 |
KERNEL_END | Last kernel to trace | UINT32_MAX |
INSTR_BEGIN | First instruction to trace per kernel | 0 |
INSTR_END | Last instruction to trace per kernel | UINT32_MAX |
COMPRESSOR_PATH | Path to the compressor binary | (built with tracer) |
DEBUG_TRACE | Generate human-readable debug traces | 0 |
OVERWRITE | Overwrite existing traces | 0 |
TOOL_VERBOSE | Enable verbose output | 0 |
See the MacSim Tracer README for full installation and usage instructions.
Known Bugs
-
src/memory.cc:1043: ASSERT FAILED— Happens with FasterTransformer traces + too many cores (40+). Solution: Reduce the number of cores. -
src/factory_class.cc:77: ASSERT FAILED— Happens whenparams.infile is missing or has a wrong name. Solution: Useparams.inas the config file name. -
src/process_manager.cc:826: ASSERT FAILED ... error opening trace file— Too many trace files open simultaneously. Solution: Addulimit -n 16384to your~/.bashrc.
People
- Prof. Hyesoon Kim (Project Leader) at Georgia Tech Hparch research group (http://hparch.gatech.edu/people.hparch)
Q & A
If you have a question, please use github issue ticket.
Tutorial
- We had a tutorial in HPCA-2012. Please visit here for the slides.
- We had a tutorial in ISCA-2012, Please visit here for the slides.
SST+MacSim
- Here are two example configurations of SST+MacSim.
- A multi-socket system with cache coherence model:

- A CPU+GPU heterogeneous system with shared memory:

- A multi-socket system with cache coherence model: