Code Generation
May 20, 2026 ยท View on GitHub
Overview
rv32emu employs a tiered execution strategy with an interpreter and a two-tier JIT compiler. The interpreter provides baseline execution while the JIT compiler generates native machine code for hot paths, significantly improving performance for compute-intensive workloads.
The code generation infrastructure supports both x86-64 and Arm64 host architectures through an abstraction layer that maps common operations to architecture-specific instructions.
Execution Tiers
The emulator uses three execution tiers:
- Interpreter: Direct execution of RISC-V instructions via tail-call threaded dispatch
- Tier-1 JIT: Template-based native code generation for basic blocks
- Tier-2 JIT: LLVM-based compilation for frequently executed hot paths
When a basic block reaches a configurable execution threshold, it gets promoted from interpreter to Tier-1 JIT. Blocks that continue to execute frequently may be further compiled by the Tier-2 LLVM backend for additional optimizations.
Source File Organization
| File | Purpose |
|---|---|
src/rv32_template.c | Interpreter instruction implementations using RVOP macro |
src/rv32_v_template.c | Vector (V) extension interpreter handlers (experimental, decode + partial execution) |
src/rv32_jit.c | Tier-1 JIT code generators using GEN macro (included by jit.c) |
src/rv32_constopt.c | IR-level constant folding and optimization |
src/rv32_v_constopt.c | Constant-folding hooks for the V extension |
src/jit.c | Tier-1 JIT infrastructure, emit_* API, and fused instruction handlers |
src/t2c.c | Tier-2 JIT driver (includes t2c_template.c) |
src/t2c_template.c | Tier-2 JIT instruction handlers using T2C_OP macro |
src/emulate.c | Main execution loop, tail-call dispatch, and macro-op fusion |
Interpreter Implementation
The interpreter uses the RVOP macro to define instruction handlers.
Each handler receives the emulator state and decoded instruction, then performs the operation directly in C:
RVOP(name, { body })
This expands to a function that:
- Receives: rv (emulator state), ir (decoded instruction), cycle (counter), PC
- Returns: bool indicating whether to continue execution
Example implementation for the addi instruction:
RVOP(addi, { rv->X[ir->rd] = rv->X[ir->rs1] + ir->imm; })
The interpreter uses Tail Call Threaded Code (TCTC) for efficient instruction dispatch.
Each handler ends with a tail call (using the musttail attribute) to the next instruction's handler,
avoiding function call overhead and enabling better branch prediction compared to switch-based dispatch.
Tier-1 JIT Code Generation
The Tier-1 JIT compiler uses the GEN macro to define native code generators:
GEN(name, { body })
Each generator emits native machine code that performs the equivalent operation using host CPU instructions.
The emit_* functions append bytes to the JIT buffer.
Example: The addi instruction generates a host instruction sequence like:
mov VR0, [memory address of (rv->X + rs1)]
mov VR1, VR0
add VR1, imm
Note: This is conceptual assembly. Actual instruction generation depends on the dynamic register allocator state and whether source registers are already mapped.
Register Allocation (Tier-1)
The Tier-1 JIT maintains a register mapping between RISC-V and host registers:
| Register | Purpose |
|---|---|
vm_reg[0..2] | Host registers mapped to VM registers for current operation |
temp_reg | Scratch register for intermediate calculations |
parameter_reg[0] | Points to the riscv_t structure |
Register allocation is performed dynamically. When a RISC-V register value is needed, the allocator either returns an already-mapped host register or loads the value from memory into an available host register.
Tier-2 JIT Compilation
The Tier-2 JIT uses LLVM to compile frequently executed blocks into highly optimized native code.
Instruction handlers are defined in src/t2c_template.c using the T2C_OP macro:
T2C_OP(name, { body })
Each handler translates the RISC-V instruction semantics into LLVM IR using the LLVM C API. LLVM then applies its optimization passes and register allocation, producing native code that typically outperforms Tier-1 for hot paths.
Tier-2 compilation requires LLVM 18-21 (LLVM 20+ is the validated default
exercised by CI on macOS arm64 and Ubuntu 24.04 x86-64). The Makefile
auto-detects llvm-config in $PATH (preferring the newest supported
version) and the matching Homebrew prefix; override with
make LLVM_CONFIG=/path/to/llvm-config to pin a specific install. Enable
T2C via make jit_defconfig (which selects both CONFIG_JIT and
CONFIG_T2C), through make config, or with the legacy ENABLE_JIT=1
shim. See build.md for the full configuration matrix.
IR Optimization
Before execution or JIT compilation,
the emulator performs optimization passes on the internal instruction representation (IR).
Defined in src/rv32_constopt.c,
these passes analyze basic blocks to identify opportunities for constant propagation and folding.
For example, a lui followed by addi to construct a 32-bit constant can be optimized to reduce runtime computation.
This optimization benefits both the interpreter and JIT tiers.
Code Generation Macros (Tier-1)
Helper macros reduce code duplication for common Tier-1 instruction patterns:
| Macro | Instructions |
|---|---|
GEN_BRANCH | beq, bne, blt, bge, bltu, bgeu |
GEN_CBRANCH | cbeqz, cbnez (compressed) |
GEN_ALU_IMM | addi, xori, ori, andi |
GEN_ALU_REG | add, sub, xor, or, and |
GEN_SHIFT_IMM | slli, srli, srai |
GEN_SHIFT_REG | sll, srl, sra |
GEN_SLT_IMM | slti, sltiu |
GEN_SLT_REG | slt, sltu |
GEN_LOAD | lb, lh, lw, lbu, lhu |
GEN_STORE | sb, sh, sw |
Each macro encapsulates the common code generation pattern for its instruction class, taking only the instruction-specific parameters (opcode, condition code, etc.).
Dual-Architecture Support
The JIT supports both x86-64 and Arm64 through an abstraction layer in jit.c.
Constants use x86-64 encoding values but serve as symbolic identifiers on both architectures.
The emit_* functions contain architecture-specific implementations guarded by preprocessor conditionals:
#if defined(__x86_64__)
// x86-64 specific code generation
#elif defined(__aarch64__)
// Arm64 specific code generation
#endif
For example, jump condition codes (JCC_JE, JCC_JNE, etc.) match x86-64 Jcc opcodes
but are mapped to equivalent Arm64 condition codes by emit_jcc_offset().
Emit API (Tier-1)
The emit_* functions provide the low-level interface for Tier-1 machine code generation:
| Function | Description |
|---|---|
emit_alu32_imm32 | ALU operation with 32-bit immediate |
emit_alu32_imm8 | ALU operation with 8-bit immediate |
emit_alu32 | ALU operation between registers |
emit_load_imm | Load immediate value into register |
emit_load / emit_store | Memory access operations |
emit_mov | Register-to-register move |
emit_cmp32 / emit_cmp_imm32 | Comparison operations |
emit_jcc_offset | Conditional jump |
emit_jmp | Unconditional jump |
emit_call | Function call to runtime helper |
Block Chaining
Translated blocks can be chained together to avoid returning to the dispatcher between consecutive blocks. When a block ends with a direct branch to another translated block, the JIT patches the branch target to jump directly to the target block's native code.
Block chaining is controlled by the ENABLE_BLOCK_CHAINING configuration option.
Macro-op Fusion
The emulator fuses common instruction sequences into single operations,
benefiting both the interpreter and JIT tiers. Fusion is implemented in src/emulate.c
via match_pattern and do_fuse* handlers, which transform the IR before execution.
For example, a lui followed by addi to construct a 32-bit constant can be fused into a single load-immediate operation. The fused instructions are then handled by dedicated handlers in each tier:
- Interpreter:
do_fuse*functions insrc/emulate.c - Tier-1 JIT:
do_fuse*functions insrc/jit.c - Tier-2 JIT:
T2C_OP(fuse*, ...)handlers insrc/t2c_template.c
Macro-op fusion is controlled by the ENABLE_MOP_FUSION configuration option.
Memory Access and MMIO
Load and store instructions check for memory-mapped I/O regions when system emulation is enabled.
The GEN_LOAD and GEN_STORE macros include conditional code generation for MMIO handling:
IIF(RV32_HAS(SYSTEM_MMIO))(
// MMIO check and handler call
,
// Direct memory access
)
When MMIO is not enabled, the generated code performs direct memory access without the overhead of region checking.