QEMU Tiny-Code Threaded Interpreter (AArch64)
April 7, 2021 ยท View on GitHub
A TCG backend that chains together JOP/ROP-ish gadgets to massively reduce interpreter overhead vs TCI. Platform-dependent; but usable when JIT isn't available; e.g. on platforms that lack WX mappings. The general idea squish the addresses of a gadget sequence into a "queue" and then write each gadget so it ends in a "dequeue-jump".
Execution occurs by jumping into the first gadget, and letting it just play back some linear-overhead native code sequences for a while.
Since TCG-TCI is optimized for sets of 16 GP registers and aarch64 has 30, we could easily keep JIT/QEMU and guest state separate, and since 16*16 is reasonably small we could actually have a set of reasonable gadgets for each combination of operands.
Register Convention
| Regs | Use |
|---|---|
| x1-x15 | Guest Registers |
| x24 | TCTI temporary |
| x25 | saved IP during call |
| x26 | TCTI temporary |
| x27 | TCTI temporary |
| x28 | Thread-stream pointer |
| x30 | Link register |
| SP | Stack Pointer, host |
| PC | Program Counter, host |
In pseudocode:
| Symbol | Meaning |
|---|---|
| Rd | stand-in for destination register |
| Rn | stand-in for first source register |
| Rm | stand-in for second source register |
Gadget Structure
End of gadget
Each gadget ends by advancing our bytecode pointer, and then executing from thew new location.
# Load our next gadget address from our bytecode stream, advancing it, and jump to the next gadget.
ldr x27, [x28], #8\n
br x27
Calling into QEMU's C codebase
When calling into C, we lose control over which registers are used. Accordingly, we'll need to save registers relevant to TCTI:
str x25, [sp, #-16]!
stp x14, x15, [sp, #-16]!
stp x12, x13, [sp, #-16]!
stp x10, x11, [sp, #-16]!
stp x8, x9, [sp, #-16]!
stp x6, x7, [sp, #-16]!
stp x4, x5, [sp, #-16]!
stp x2, x3, [sp, #-16]!
stp x0, x1, [sp, #-16]!
stp x28, lr, [sp, #-16]!
Upon returning to the gadget stream, we'll then restore them.
ldp x28, lr, [sp], #16
ldp x0, x1, [sp], #16
ldp x2, x3, [sp], #16
ldp x4, x5, [sp], #16
ldp x6, x7, [sp], #16
ldp x8, x9, [sp], #16
ldp x10, x11, [sp], #16
ldp x12, x13, [sp], #16
ldp x14, x15, [sp], #16
ldr x25, [sp], #16
TCG Operations
Each operation needs an implementation for every platform; and probably a set of gadgets for each possible set of operands.
At 14 GP registers, that means that
1 operand => 16 gadgets 2 operands => 256 gadgets 3 operands => 4096 gadgets
call
Calls a helper function by address.
IR Format: br <ptr address>
Gadget type: single
# Get our C runtime function's location as a pointer-sized immediate...
"ldr x27, [x28], #8",
# Store our TB return address for our helper. This is necessary so the GETPC()
# macro works correctly as used in helper functions.
"str x28, [x25]",
# Prepare ourselves to call into our C runtime...
*C_CALL_PROLOGUE,
# ... perform the call itself ...
"blr x27",
# Save the result of our call for later.
"mov x27, x0",
# ... and restore our environment.
*C_CALL_EPILOGUE,
# Restore our return value.
"mov x0, x27"
br
Branches to a given immediate address. Branches are
IR Format: br <ptr address>
Gadget type: single
# Use our immediate argument as our new bytecode-pointer location.
ldr x28, [x28]
setcond_i32
Performs a comparison between two 32-bit operands.
IR Format: setcond32 <cond>, Rd, Rn, Rm
Gadget type: treated as 10 operations with variants for every Rd/Rn/Rm (40,960)
subs Wd, Wn, Wm
cset Wd, <cond>
| QEMU Cond | AArch64 Cond |
|---|---|
| EQ | EQ |
| NE | NE |
| LT | LT |
| GE | GE |
| LE | LE |
| GT | GT |
| LTU | LO |
| GEU | HS |
| LEU | LS |
| GTU | HI |
setcond_i64
Performs a comparison between two 32-bit operands.
IR Format: setcond64 <cond>, Rd, Rn, Rm
Gadget type: treated as 10 operations with variants for every Rd/Rn/Rm (40,960)
subs Xd, Xn, Xm
cset Xd, <cond>
Comparison chart is the same as the _i32 variant.
brcond_i32
Compares two 32-bit numbers, and branches if the comparison is true.
IR Format: brcond Rn, Rm, <cond>
Gadget type: treated as 10 operations with variants for every Rn/Rm (2560)
# Perform our comparison and conditional branch.
subs Wrz, Wn, Wm
br<cond> taken
# Consume the branch target, without using it.
add x28, x28, #8
# Perform our end-of-instruction epilogue.
<epilogue here>
taken:
# Update our bytecode pointer to take the label.
ldr x28, [x28]
Comparison chart is the same as in setcond_i32 .
brcond_i64
Compares two 64-bit numbers, and branches if the comparison is true.
IR Format: brcond Rn, Rm, <cond>
Gadget type: treated as 10 operations with variants for every Rn/Rm (2560)
# Perform our comparison and conditional branch.
subs Xrz, Xn, Xm
br<cond> taken
# Consume the branch target, without using it.
add x28, x28, #8
# Perform our end-of-instruction epilogue.
<epilogue here>
taken:
# Update our bytecode pointer to take the label.
ldr x28, [x28]
Comparison chart is the same as in setcond_i32 .
mov_i32
Moves a value from a register to another register.
IR Format: mov Rd, Rn
Gadget type: gadget per Rd + Rn combo (256)
mov Rd, Rn
mov_i64
Moves a value from a register to another register.
IR Format: mov Rd, Rn
Gadget type: gadget per Rd + Rn combo (256)
mov Xd, Xn
tci_movi_i32
Moves an 32b immediate into a register.
IR Format: mov Rd, #imm32
Gadget type: gadget per Rd (16)
ldr w27, [x28], #4
mov Wd, w27
tci_movi_i64
Moves an 64b immediate into a register.
IR Format: mov Rd, #imm64
Gadget type: gadget per Rd (16)
ldr x27, [x28], #4
mov Xd, x27
ld8u_i32 / ld8u_i64
Load byte from host memory to register.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
ldrb Xd, [Xn, x27]
ld8s_i32 / ld8s_i64
Load byte from host memory to register; sign extending.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
ldrsb Xd, [Xn, x27]
ld16u_i32 / ld16u_i64
Load 16b from host memory to register.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
ldrh Wd, [Xn, x27]
ld16s_i32 / ld16s_i64
Load 16b from host memory to register; sign extending.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
ldrsh Xd, [Xn, x27]
ld32u_i32 / ld32u_i64
Load 32b from host memory to register.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
ldr Wd, [Xn, x27]
ld32s_i64
Load 32b from host memory to register; sign extending.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
ldrsw Xd, [Xn, x27]
ld_i64
Load 64b from host memory to register.
IR Format: ldr Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
ldr Xd, [Xn, x27]
st8_i32 / st8_i64
Stores byte from register to host memory.
IR Format: str Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
strb Wd, [Xn, x27]
st16_i32 / st16_i64
Stores 16b from register to host memory.
IR Format: str Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
strh Wd, [Xn, x27]
st_i32 / st32_i64
Stores 32b from register to host memory.
IR Format: str Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
str Wd, [Xn, x27]
st_i64
Stores 64b from register to host memory.
IR Format: str Rd, Rn, <signed offset>
Gadget type: gadget per Rd & Rn (256)
ldrsw x27, [x28], #4
str Xd, [Xn, x27]
qemu_ld_i32
Loads 32b from guest memory to register.
IR Format: ld Rd, <foreign/guest pointer>, <memory operation>
Gadget type: thunk per Rd into C impl?
qemu_ld_i64
Loads 64b from guest memory to register.
IR Format: ld Rd, <foreign/guest pointer>, <memory operation>
Gadget type: thunk per Rd into C impl?
qemu_st_i32
Stores 32b from a register to guest memory.
IR Format: st Rd, <foreign/guest pointer>, <memory operation>
Gadget type: thunk per Rd into C impl
qemu_st_i64
Stores 64b from a register to guest memory.
IR Format: st Rd, <foreign/guest pointer>, <memory operation>
Gadget type: thunk per Rd into C impl?
Note
See note on qemu_ld_i32.
add_i32
Adds two 32-bit numbers.
IR Format: add Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
add Wd, Wn, Wm
add_i64
Adds two 64-bit numbers.
IR Format: add Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
add Xd, Xn, Xm
sub_i32
Subtracts two 32-bit numbers.
IR Format: add Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
Sub Wd, Wn, Wm
sub_i64
Subtracts two 64-bit numbers.
IR Format: sub Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
sub Xd, Xn, Xm
mul_i32
Multiplies two 32-bit numbers.
IR Format: mul Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
mul Wd, Wn, Wm
mul_i64
Multiplies two 64-bit numbers.
IR Format: mul Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
mul Xd, Xn, Xm
div_i32
Divides two 32-bit numbers; considering them signed.
IR Format: div Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
sdiv Wd, Wn, Wm
div_i64
Divides two 64-bit numbers; considering them signed.
IR Format: div Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
sdiv Xd, Xn, Xm
divu_i32
Divides two 32-bit numbers; considering them unsigned.
IR Format: div Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
udiv Wd, Wn, Wm
divu_i64
Divides two 32-bit numbers; considering them unsigned.
IR Format: div Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
udiv Xd, Xn, Xm
rem_i32
Computes the division remainder (modulus) of two 32-bit numbers; considering them signed.
IR Format: rem Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
sdiv w27, Wn, Wm
msub Wd, w27, Wm, Wn
rem_i64
Computes the division remainder (modulus) of two 64-bit numbers; considering them signed.
IR Format: rem Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
sdiv x27, Xn, Xm
msub Xd, x27, Xm, Xn
remu_i32
Computes the division remainder (modulus) of two 32-bit numbers; considering them unsigned.
IR Format: rem Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
udiv w27, Wn, Wm
msub Wd, w27, Wm, Wn
remu_i64
Computes the division remainder (modulus) of two 32-bit numbers; considering them unsigned.
IR Format: rem Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
udiv x27, Xn, Xm
msub Xd, x27, Xm, Xn
not_i32
Logically inverts a 32-bit number.
IR Format: not Rd, Rn
Gadget type: gadget per Rd, Rn (256)
mvn Wd, Wn
not_i64
Logically inverts a 64-bit number.
IR Format: not Rd, Rn
Gadget type: gadget per Rd, Rn (256)
mvn Xd, Xn
neg_i32
Arithmetically inverts (two's compliment) a 32-bit number.
IR Format: not Rd, Rn
Gadget type: gadget per Rd, Rn (256)
neg Wd, Wn
neg_i64
Arithmetically inverts (two's compliment) a 64-bit number.
IR Format: not Rd, Rn
Gadget type: gadget per Rd, Rn (256)
neg Xd, Xn
and_i32
Logically ANDs two 32-bit numbers.
IR Format: and Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
and Wd, Wn, Wm
and_i64
Logically ANDs two 64-bit numbers.
IR Format: and Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
and Xd, Xn, Xm
or_i32
Logically ORs two 32-bit numbers.
IR Format: or Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
or Wd, Wn, Wm
or_i64
Logically ORs two 64-bit numbers.
IR Format: or Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
or Xd, Xn, Xm
xor_i32
Logically XORs two 32-bit numbers.
IR Format: xor Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
eor Wd, Wn, Wm
xor_i64
Logically XORs two 64-bit numbers.
IR Format: xor Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
eor Xd, Xn, Xm
shl_i32
Logically shifts a 32-bit number left.
IR Format: shl Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
lsl Wd, Wn, Wm
shl_i64
Logically shifts a 64-bit number left.
IR Format: shl Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
lsl Xd, Xn, Xm
shr_i32
Logically shifts a 32-bit number right.
IR Format: shr Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
lsr Wd, Wn, Wm
shr_i64
Logically shifts a 64-bit number right.
IR Format: shr Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
lsr Xd, Xn, Xm
sar_i32
Arithmetically shifts a 32-bit number right.
IR Format: sar Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
asr Wd, Wn, Wm
sar_i64
Arithmetically shifts a 64-bit number right.
IR Format: sar Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
asr Xd, Xn, Xm
rotl_i32
Rotates a 32-bit number left.
IR Format: rotl Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
rol Wd, Wn, Wm
rotl_i64
Rotates a 64-bit number left.
IR Format: rotl Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
rol Xd, Xn, Xm
rotr_i32
Rotates a 32-bit number right.
IR Format: rotr Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
ror Wd, Wn, Wm
rotr_i64
Rotates a 64-bit number right.
IR Format: rotr Rd, Rn, Rm
Gadget type: gadget per Rd, Rn, Rm (4096)
ror Xd, Xn, Xm
deposit_i32
Optional; not currently implementing.
deposit_i64
Optional; not currently implementing.
ext8s_i32
Sign extends the lower 8b of a register into a 32b destination.
IR Format: ext8s Rd, Rn
Gadget type: gadget per Rd, Rn (256)
sxtb Wd, Wn
ext8s_i64
Sign extends the lower 8b of a register into a 64b destination.
IR Format: ext8s Rd, Rn
Gadget type: gadget per Rd, Rn (256)
sxtb Xd, Wn
ext8u_i32
Zero extends the lower 8b of a register into a 32b destination.
IR Format: ext8u Rd, Rn
Gadget type: gadget per Rd, Rn (256)
and Xd, Xn, #0xff
ext8u_i64
Zero extends the lower 8b of a register into a 64b destination.
IR Format: ext8u Rd, Rn
Gadget type: gadget per Rd, Rn (256)
and Xd, Xn, #0xff
ext16s_i32
Sign extends the lower 16b of a register into a 32b destination.
IR Format: ext16s Rd, Rn
Gadget type: gadget per Rd, Rn (256)
sxth Xd, Wn
ext16s_i64
Sign extends the lower 16b of a register into a 64b destination.
IR Format: ext16s Rd, Rn
Gadget type: gadget per Rd, Rn (256)
sxth Xd, Wn
ext16u_i32
Zero extends the lower 16b of a register into a 32b destination.
IR Format: ext16u Rd, Rn
Gadget type: gadget per Rd, Rn (256)
and Wd, Wn, #0xffff
ext16u_i64
Zero extends the lower 16b of a register into a 32b destination.
IR Format: ext16u Rd, Rn
Gadget type: gadget per Rd, Rn (256)
and Wd, Wn, #0xffff
ext32s_i64
Sign extends the lower 32b of a register into a 64b destination.
IR Format: ext32s Rd, Rn
Gadget type: gadget per Rd, Rn (256)
sxtw Xd, Wn
ext32u_i64
Zero extends the lower 32b of a register into a 64b destination.
IR Format: ext32s Rd, Rn
Gadget type: gadget per Rd, Rn (256)
sxtw Xd, Wn
ext_i32_i64
Sign extends the lower 32b of a register into a 64b destination.
IR Format: ext32s Rd, Rn
Gadget type: gadget per Rd, Rn (256)
sxtw Xd, Wn
extu_i32_i64
Zero extends the lower 32b of a register into a 32b destination.
IR Format: ext32u Rd, Rn
Gadget type: gadget per Rd, Rn (256)
and Xd, Xn, #0xffffffff
bswap16_i32
Byte-swaps a 16b quantity.
IR Format: bswap16 Rd, Rn
Gadget type: gadget per Rd, Rn (256)
rev w27, Wn
lsr Wd, w27, #16
bswap16_i64
Byte-swaps a 16b quantity.
IR Format: bswap16 Rd, Rn
Gadget type: gadget per Rd, Rn (256)
rev w27, Wn
lsr Wd, w27, #16
bswap32_i32
Byte-swaps a 32b quantity.
IR Format: bswap32 Rd, Rn
Gadget type: gadget per Rd, Rn (256)
rev Wd, Wn
bswap32_i64
Byte-swaps a 32b quantity.
IR Format: bswap32 Rd, Rn
Gadget type: gadget per Rd, Rn (256)
rev Wd, Wn
bswap64_i64
Byte-swaps a 64b quantity.
IR Format: bswap64 Rd, Rn
Gadget type: gadget per Rd, Rn (256)
rev Xd, Xn
exit_tb
Exits the translation block. Has no gadget; but instead inserts the address of the translation block epilogue.
mb
Memory barrier.
IR Format: mb <type>
Gadget type: gadget per type
# !!! TODO
Note
We still need to look up out how to map QEMU MB types map to AArch64 ones. This might take nuance.