๐ Using the Tracer Tool for Accel-Sim
July 16, 2025 ยท View on GitHub
This document explains how to use the tracer_tool for generating instruction traces for GPU applications. The tool supports full benchmark suites, individual applications, specific kernel tracing, source line mapping, and more.
๐ ๏ธ Setup and Installation
Before using the tracer, make sure to install and build the required tools:
# Install NVBit
./install_nvbit.sh
# Compile tracer tools
./make
โ๏ธ Option 1: Trace a Full Benchmark Suite
Use this if you're tracing an entire benchmark suite (e.g., Rodinia):
./run_hw_trace.py -B rodinia-3.1 -D 0
-B: Benchmark suite name (app list can be found or defined in this file)-D: Hardware device ID (e.g., GPU 0)
๐ Traces will be stored in:
../../hw_run/traces/device-0/
This script handles trace generation, post-processing, and cleanup automatically.
See the Trace File Structure section for details on the output.
โ๏ธ Option 2: Trace an Individual Application
Use this approach if you want to trace a specific application binary (e.g., vectoradd):
export CUDA_VISIBLE_DEVICES=0
LD_PRELOAD=./tracer_tool/tracer_tool.so ./nvbit_release/test-apps/vectoradd/vectoradd
๐ Traces will appear in the traces/ folder.
See the Trace File Structure section for contents.
๐ Note: Unlike Option 1, you must manually perform post-processing:
./tracer_tool/traces-processing/post-traces-processing ./traces/kernelslist
This will generate .traceg and kernelslist.g files that are ready for Accel-Sim.
๐ฆ Trace File Structure
Each trace folder contains:
kernel-*.trace: Raw trace files (one per kernel)kernelslist: List of traced kernels and CUDA memcpy operationsstats.csv: Summary statistics (instruction counts, kernel IDs, etc.)
After post-processing:
.traceg: Grouped trace files by thread blockkernelslist.g: Final trace list for use with Accel-Sim
๐ฏ Selective Tracing (Kernel-Based Filtering)
You can now trace specific kernels using the DYNAMIC_KERNEL_RANGE environment variable.
Usage
Set DYNAMIC_KERNEL_RANGE to specify which kernel IDs (and optionally names) to trace. Supported formats:
-
Single ID:
export DYNAMIC_KERNEL_RANGE="3"Traces only kernel 3.
-
Range:
export DYNAMIC_KERNEL_RANGE="5-8"Traces kernels 5 through 8 (inclusive).
-
Open-ended Range:
export DYNAMIC_KERNEL_RANGE="10-"Traces from kernel 10 onward.
-
Multiple Ranges (space-separated):
export DYNAMIC_KERNEL_RANGE="2 5-8 10-" -
With Name Filters (regex):
export DYNAMIC_KERNEL_RANGE="5-8@kernel_a.*,kernel_b.*"Traces kernels 5โ8 only if their names match
kernel_a.*orkernel_b.*.
To Disable Tracing But Still List Kernels
To list kernel metadata in stats.csv without generating traces, we set the DYNAMIC_KERNEL_RANGE to very large number :
export DYNAMIC_KERNEL_RANGE="1000000"
This is useful for discovering kernel IDs and names without producing large trace files.
โฑ๏ธ Alternative: Trace Specific Code Regions with CUDA Profiling
Wrap regions to trace using CUDA APIs:
cudaProfilerStart();
// region to trace
cudaProfilerStop();
Then disable default tracing:
export ACTIVE_FROM_START=0
๐ Trace Source Line Mapping
Enable source line information in your traces:
- Set environment variable:
export TRACE_LINEINFO=1
- Rebuild benchmark applications:
source ./gpu-app-collection/src/setup_environment
make -j -C ./gpu-app-collection/src rodinia_2.0-ft
Traces will now include line number info from the original CUDA source (requires -lineinfo flag in NVCC).
๐ Trace Format Explanation
Each instruction has at least 10 required columns:
[line_num] PC mask dest_num [reg_dests] opcode src_num [reg_srcs] mem_width [addresscompress?] [mem_addresses]
Details:
- Fields in
[]are optional and appear only if applicable. dest_num = 0โ no destination register field.mem_width = 0โ no memory address info present.
๐งพ Example:
31 0 0 3 0000 ffffffff 1 R1 IMAD.MOV.U32 2 R255 R255 0
This line represents:
- Threadblock: (31, 0, 0)
- PC:
0000, Mask:ffffffff - One destination register:
R1 - Opcode:
IMAD.MOV.U32 - Two source registers:
R255,R255 - Not a memory instruction (
mem_width = 0)