E9Tool User's Guide

January 22, 2025 · View on GitHub

1. Usage
2. Matching Language
3. Patching Language

1. Usage

E9Tool is the default frontend for E9Patch. E9Tool translates high-level patching commands (i.e., what instructions to patch, and how to patch them) into low-level commands for E9Patch.

The basic usage of E9Tool is as follows:

    $ e9tool -M MATCH -P PATCH binary

Where:

binary is the binary to patch (executable or shared object)
-M MATCH specifies which instructions in binary to patch (see Matching Language below)
-P PATCH specified how matching instructions should be patched (see Patching Language below)

After rewriting, the patched binary will be written to a.out (for executables) or a.so (for shared objects) by default.

For example, the following command will instrument all jump instructions in the xterm binary:

    $ e9tool -M jmp -P print xterm

Running the patched binary yields:

    $ ./a.out
    jz 0x4064d5
    jz 0x452c36
    jnz 0x4092d0
    ...

E9Tool supports many options, see e9tool --help for more information.

1.1 Optimization

E9Tool supports several optimization options, namely:

-O0 disables all optimization
-O1 conservatively optimizes for performance
-O2 optimizes for performance
-O3 aggressively optimizes for performance
-Os optimizes for space

The default optimization level of -O2.

1.2 Compression

E9Tool supports different compression levels for the output binary, controlled by the -c N option for N in 0..9. Here 9 means the most compression, and 0 is the least compression.

Higher compression levels generally result in smaller output binaries, but will use more mappings (mmap() calls), sometimes in the order of thousands. Lower compression levels will use less mappings, but may bloat the output file size significantly. The default compression level is 9 (most compression).

1.3 Rewriting Modes

E9Tool (and E9Patch) supports three main rewriting modes:

Default: Classic binary rewriting without control-flow recovery. This is the "official" mode of E9Tool/E9Patch for benchmarking and testing purposes.
CFR: A control-flow-recovery (CFR) mode that uses a (conservative) CFR analysis to optimize further binary rewriting.
100%: A full-coverage (100%) mode that ensures that every matching instruction will be patched, even if this (significantly) reduces performance.

Below if a summary of the different modes and features:

	Option	Performance	Robustness	Coverage
Default		★★	★★★	★★
CFR	`-CFR`	★★★	★½	★★½
100%	`-100`	0	★★★	★★★

Here, Option is the E9Tool command-line option to enable the mode, Performance is the relative performance of the rewitten binary, Robustness is the relative absense rewriting errors, and Coverage is the patching coverage (number of matching instructions actually patched).

1.3.1 Control-Flow-Recovery Mode

The CFR mode can be enabled by passing -CFR to E9Tool, e.g.:

    $ e9tool -CFR -M jmp -P print xterm

The CFR mode has pros and cons, which may or may not be acceptable depending on the application. The main pros are:

The rewritten binary will be much faster
The patching coverage will be much higher
The rewriting speed will be improved

However, since CFR is heuristic-based, it may not be complete, leading to possible rewriting errors. Thus, the CFR mode is not as robust as the default mode, although it should be compatible with most binaries.

1.3.2 Full-Coverage Mode

The 100% (full coverage) mode can be enabled by passing -100 to E9Tool, e.g.:

    $ e9tool -100 -M jmp -P print xterm

This mode aims to achieve 100% patching coverage, even if this (significantly) degrades performance. To do so, the 100% mode will resort to illegal opcodes and SIGILL handlers if an instruction cannot be patched using any other method. The 100% mode is useful for applications that need full coverage, even if this comes at the cost of performance.

Note that the 100% mode has some caveats:

Any attempt by the program to install a SIGILL signal handler will fail with errno=ENOSYS. This potentially breaks transparency.
The patching coverage may not reach 100% for other reasons, such as virtual address space exhaustion. Such cases will be uncommon.

The 100% and CFR modes are compatible and can be combined, e.g.:

    $ e9tool -100 -CFR -M jmp -P print xterm

1.4 Disassembly and Analysis

For convenience, E9Tool comes with a built-in linear disassembler that should handle some binaries compiled with standard compilers, such as gcc. However, linear disassemblers have known limitations for some binaries that mix code and data. It is possible to override the default disassembler using the --use-disasm option, e.g.:

    $ e9tool --use-disasm disasm.csv ...

Here, disasm.csv is a single-column comma-separated-value (CSV) file that should contain all instruction addresses to be disassembled. The disasm.csv file can be generated by other disassemblers, and integrated into the E9Tool/E9Patch toolchain.

Similarly, E9Tool's default control-flow-recovery analysis can be overridden by the --use-targets option, e.g.:

    $ e9tool --use-targets targets.csv ...

Here, targets.csv is a CSV file with one or two columns. The first column is the list of all jump/call target addresses in the binary. The second column, if present, is a Boolean value (either 0 or 1), where a value of 1 indicates that the target is a function entry, and 0 otherwise. Like disassembly, the targets.csv file can be generated by other binary analysis tools and integrated into the E9Tool/E9Patch toolchain.

Note that --use-targets option does not affect the internal control-flow-recovery analysis for E9Patch (for the -X option). Rather, the option only affects E9Tool's matching/patching operations.

2. Matching Language

The matching language specifies what instructions should be patched by the corresponding patch (see below). Matchings are specified using the (--match MATCH) or (-M MATCH) command-line option. The basic form of a matching (MATCH) is a Boolean expression of TESTs using the following high-level grammar:

    MATCH ::= EXPR
    EXPR  ::=
              | VALUE
              | VARIABLE
              | defined( EXPR )
              | ( EXPR )
              | not EXPR
              | EXPR and EXPR
              | EXPR or EXPR
              | EXPR + EXPR | EXPR - EXPR | EXPR * EXPR | EXPR / EXPR | EXPR % EXPR | -EXPR
              | EXPR & EXPR | EXPR | EXPR | EXPR ^ EXPR | ~EXPR
              | EXPR << EXPR | EXPR >> EXPR
              | EXPR == EXPR | EXPR != EXPR
              | EXPR < EXPR | EXPR <= EXPR
              | EXPR > EXPR | EXPR >= EXPR

An instruction will match a given expression EXPR if the expression evaluates to a non-zero value.

Each VARIABLE evaluates to some specific property/attribute of the underlying instruction, defined using the following grammar:

    VARIABLE ::= [ SPECIFIER . ] ATTRIBUTE

See the list of attributes and instruction specifiers below.

A VALUE can be one of the following:

An integer constant, e.g., 123, 0x123, etc.
A string constant, e.g., "abc", etc.
An enumeration value, including:
- register names (rax, eax, etc.)
- operand types (imm, reg, mem)
- access types (-, r, w, rw)
A memory operand (see below).
A symbolic address of the form NAME, where NAME is any section or symbol name from the input ELF file. Section names can be modified with a .start or .end suffix, where the latter will point to the end of the section. A symbolic address has type Integer.
A set of VALUEs, e.g., {rax,rbx,rcx}.
A regular expression delimited by slashes (/), e.g., /xor.*/, /mov.+\(%rax.*/, etc.

String values can be matched (or not matched) against regular expressions using the equality == (or disequality !=) comparison operators. For example, the test ("mov (%rax,%rbx,8),%rcx" == /mov.+\(%rax.*/) will evaluate to true.

Memory operands can be represented using the following syntax:

    ( mem8 | mem16 | mem32 | mem64 ) < MEMOP >

Here, the mem8...mem64 token specifies the size of the memory operand, and MEMOP is the memory operand itself specified in AT&T syntax. For example, the following explicit memory operands access stack memory:

    mem32<(%rax)>
    mem64<0x100(%rsp)>
    mem64<0x200(%rsp,%rax,8)>
    ...

Finally, several operators are supported that can be used to build expressions. Supported operators include:

Operator	Description
`+`	Integer addition
`-`	Integer subtraction or unary negation
*``**	Integer multiplication
`/` or `div`	Integer division
`%` or `mod`	Integer modulus
`&`	Integer bitwise and
`\|`	Integer bitwise or
`^`	Integer bitwise xor
`~`	Integer bitwise negation
`<<`	Integer left shift
`>>`	Integer right shift
`=` or `==`	Equality
`!=`	Disequality
`>`	Greater-than
`>=`	Greater-than-or-equal-to
`<`	Less-than
`<=`	Less-than-or-equal-to
`in`	Set membership or subset
`and`	Boolean and
`or`	Boolean or
`not`	Boolean negation

Alternatively, C-style Boolean operations (!, &&, and ||) can be used instead of (not, and, and or).

2.1 Attributes

The following ATTRIBUTEs (with corresponding types) are supported:

Attribute	Type	Description
`true`	`Boolean`	True
`false`	`Boolean`	False
`jmp`	`Boolean`	True for jump instructions, false otherwise
`jcc`	`Boolean`	True for conditional jump instructions, false otherwise
`call`	`Boolean`	True for call instructions, false otherwise
`ret`	`Boolean`	True for return instructions, false otherwise
`asm`	`String`	The assembly string representation
`mnemonic`	`String`	The mnemonic
`section`	`String`	The section name
`addr`	`Integer`	The ELF virtual address
`offset`	`Integer`	The ELF file offset
`size`	`Integer`	The size of the instruction in bytes
`random`	`Integer`	A random value [0..`RAND_MAX`]
`target`	`Integer`	The jump/call target (if statically known).
`x87`	`Boolean`	True for x87 instructions, false otherwise
`mmx`	`Boolean`	True for MMX instructions, false otherwise
`sse`	`Boolean`	True for SSE instructions, false otherwise
`avx`	`Boolean`	True for AVX instructions, false otherwise
`avx2`	`Boolean`	True for AVX2 instructions, false otherwise
`avx512`	`Boolean`	True for AVX512 instructions, false otherwise
`bytes[i]`	`Integer`	The i^th instruction byte
`rex`	`Integer`	The value of the REX prefix if used, undefined otherwise
`modrm`	`Integer`	The value of the MODRM byte if used, undefined otherwise
`sib`	`Integer`	The value of the SIB byte if used, undefined otherwise
`disp8`	`Integer`	The value of the 8-bit displacement if used, undefined otherwise
`disp32`	`Integer`	The value of the 32-bit displacement if used, undefined otherwise
`imm8`	`Integer`	The value of the 8-bit immediate if used, undefined otherwise
`imm32`	`Integer`	The value of the 32-bit immediate if used, undefined otherwise
`op.size`	`Integer`	The number of operands
`src.size`	`Integer`	The number of source operands
`dst.size`	`Integer`	The number of destination operands
`imm.size`	`Integer`	The number of immediate operands
`reg.size`	`Integer`	The number of register operands
`mem.size`	`Integer`	The number of memory operands
`op[i]`	`Operand`	The i^th operand
`src[i]`	`Operand`	The i^th source operand
`dst[i]`	`Operand`	The i^th destination operand
`imm[i]`	`Operand`	The i^th immediate operand
`reg[i]`	`Operand`	The i^th register operand
`mem[i]`	`Operand`	The i^th memory operand
`&mem[i]`	`Integer`	The i^th memory operand address, if statically known
`op[i].size`	`Integer`	The i^th operand size
`src[i].size`	`Integer`	The i^th source operand size
`dst[i].size`	`Integer`	The i^th destination operand size
`imm[i].size`	`Integer`	The i^th immediate operand size
`reg[i].size`	`Integer`	The i^th register operand size
`mem[i].size`	`Integer`	The i^th memory operand size
`op[i].type`	`{imm,reg,mem}`	The i^th operand type
`src[i].type`	`{imm,reg,mem}`	The i^th source operand type
`dst[i].type`	`{imm,reg,mem}`	The i^th destination operand type
`op[i].access`	`{-,r,w,rw}`	The i^th operand access
`src[i].access`	`{-,r,w,rw}`	The i^th source operand access
`dst[i].access`	`{-,r,w,rw}`	The i^th destination operand access
`reg[i].access`	`{-,r,w,rw}`	The i^th register operand access
`mem[i].access`	`{-,r,w,rw}`	The i^th memory operand access
`op[i].seg`	`Register`	The i^th operand segment register
`src[i].seg`	`Register`	The i^th source operand segment register
`dst[i].seg`	`Register`	The i^th destination operand segment register
`mem[i].seg`	`Register`	The i^th memory operand segment register
`op[i].disp`	`Integer`	The i^th operand displacement
`src[i].disp`	`Integer`	The i^th source operand displacement
`dst[i].disp`	`Integer`	The i^th destination operand displacement
`mem[i].disp`	`Integer`	The i^th memory operand displacement
`op[i].base`	`Register`	The i^th operand base register
`src[i].base`	`Register`	The i^th source operand base register
`dst[i].base`	`Register`	The i^th destination operand base register
`mem[i].base`	`Register`	The i^th memory operand base register
`op[i].index`	`Register`	The i^th operand index register
`src[i].index`	`Register`	The i^th source operand index register
`dst[i].index`	`Register`	The i^th destination operand index register
`mem[i].index`	`Register`	The i^th memory operand index register
`op[i].scale`	`Integer`	The i^th operand scale
`src[i].scale`	`Integer`	The i^th source operand scale
`dst[i].scale`	`Integer`	The i^th destination operand scale
`mem[i].scale`	`Integer`	The i^th memory operand scale
`regs`	`Set<Register>`	The set of all accessed registers
`reads`	`Set<Register>`	The set of all read-from registers
`writes`	`Set<Register>`	The set of all written-to registers
`BB`	`Integer`	The ELF virtual address of the current basic-block
`BB.addr`	`Integer`	Alias for `BB`
`BB.offset`	`Integer`	The ELF file offset of the current basic-block
`BB.entry`	`Boolean`	True for the first instruction in the current basic-block, false otherwise.
`BB.exit`	`Boolean`	True for the last instruction in the current basic-block, false otherwise.
`BB.best`	`Boolean`	True for the "best" instruction in the current basic-block, false otherwise.
`BB.size`	`Integer`	The size of the current basic-block in bytes
`BB.len`	`Integer`	The number of instructions in the current basic-block
`F`	`Integer`	The ELF virtual address of the current function
`F.addr`	`Integer`	Alias for `F`
`F.offset`	`Integer`	The ELF file offset of the current function
`F.entry`	`Boolean`	True for the first instruction in the current function, false otherwise.
`F.best`	`Boolean`	True for the "best" instruction in the current function, false otherwise.
`F.size`	`Integer`	The size of the current function in bytes
`F.len`	`Integer`	The number of instructions in the current function
`F.name`	`String`	The name of the function (if available)
`file`	`String`	The source filename (if available)
`absname`	`String`	The full path of `file` (if available)
`basename`	`String`	The basename component of `absname` (if available)
`dirname`	`String`	The directory component of `absname` (if available)
`line`	`Integer`	The source line number (if available)
`line.entry`	`Boolean`	True for the first instruction in each line (if available)
`NAME[i]`	`Integer \| String`	The corresponding value from the `NAME.csv` file
`plugin(NAME).match()`	`Integer`	Value from `NAME.so` plugin

Here Register is the set of all x86_64 register names defined as follows:

    Register = {
        rip, rflags,
        es, cs, ss, ds, fs, gs,
        ah, ch, dh, bh,
        al, cl, dl, bl, spl, bpl, sil, dil, r8b, ..., r15b,
        ax, cx, dx, bx, sp, bp, si, di, r8w, ..., r15w,
        eax, ecx, edx, ebx, esp, ebp, esi, edi, r8d, ..., r15d,
        rax, rcx, rdx, rbx, rsp, rbp, rsi, rdi, r8, ..., r15,
        xmm0, ..., xmm31,
        ymm0, ..., ymm31,
        zmm0, ..., zmm31, ...}

An Operand can be one of three values:

An immediate value represented by an Integer
A register represented by a Register
A memory operand represented by a MemOp

Thus the Operand type is the union of the Integer, Register, and MemOp types:

    Operand = Integer | Register | MemOp

The file, absname, basename, dirname, line, and line.entry attributes are only defined if the binary was compiled with debug information (-g).

2.2 Definedness

Not all attributes are defined for all instructions. For example, if the instruction has 3 operands, then only op[0], op[1], and op[2] will be defined, and op[3] and beyond will be undefined. Similarly, op[0].base will be undefined if the first operand of the instruction is not a memory operand.

Any comparison that uses an undefined value will fail. For example, both of the tests (op[3] == 0x1) and (op[3] != 0x1) will fail, despite each test being the negation of the other. The explicit Boolean operators (not, and, and or) treat failure due to undefinedness the same as false, thus the tests (op[3] != 0x1) and (not op[3] == 0x1) are not equivalent for undefined values.

The special defined(EXPR) test can be used to determine if the given expression is defined or not.

2.3 Control-flow

The BB.* attributes represent properties over the current basic-block which contains the instruction being matched. Here, a basic-block is a straight-line instruction sequence with a single entry point and a single exit (excluding function calls). The set of basic-blocks are recovered from the input binary using a simple built-in static analysis. That said, basic-block recovery is well-known to be an undecidable problem in the general case, meaning that the built-in analysis must rely on several heuristics that may not be perfectly accurate. As such, the BB.* attributes should only be used for applications (e.g., optimization) where some inaccuracy can be tolerated. Finally, we note that the recovered basic-block information is only made available to the application layer. The recovered information is not passed to (or used by) the underlying E9Patch binary rewriter.

The BB.best attribute selects the "best" instruction in a basic-block to instrument in order to maximum coverage and speed. This is useful for applications that need to instrument at the basic-block level (rather than the instruction level).

Similarly, the F.* attributes represent properties over the current function which contains the instruction being matched. As with basic-blocks, E9Tool uses a very simple (heuristic-based) function recovery analysis that is not guaranteed to be accurate, so function matching should only be used for applications where some inaccuracy can be tolerated. The F.name attribute is the name of the current function if known (i.e., there exists an entry in an ELF symbol table), else the result is undefined.

2.4 Instruction Specifiers

The attribute expression may be annotated by an explicit instruction SPECIFIER of the following form:

        SPECIFIER ::= INSTR-SET [ INDEX ]
        INSTR-SET ::= ( I | BB | F)

Here INSTR-SET is one of the following instruction sets:

I: The set of all disassembled instructions.
BB: The set of all instructions in the current basic-block.
F: The set of all instructions in the current function.

The INDEX is a signed integer which represents an offset relative to the current instruction. For example, I[0] is the current instruction, I[1] is the next instruction, I[-1] is the previous instruction, etc. Similarly, BB[1] is the next instruction in the current basic-block, F[-1] is the previous instruction in the current function, etc. Note that previous/next instructions may not exist, in which case the result will be undefined.

If unspecified, the instruction specifier is implicitly I[0] (i.e., the current instruction). Note that BB[0] and F[0] may be undefined if the current instruction does not belong to any basic block or function. For example, padding NOPs inserted by the compiler for alignment purposes are not considered part of a basic block.

Instruction specifiers are useful for matching some context around the current instruction. For example, the following matches all conditional jump instructions that are immediately preceded by a comparison in the same basic block:

    jcc and BB[-1].mnemonic == "cmp"

2.5 Comma-Separated Values

It is possible to match against user-defined data stored in one or more comma-separated values (CSV) files using the NAME[i] attribute. This makes it possible to match against data generated by other binary analysis tools, e.g., control-flow information, etc.

Here, the NAME[i] attribute will parse the NAME.csv file and resolve to the following value:

The row is selected by the address of the matching instruction, which is matched against the first column stored in the NAME.csv file.
The column is selected by the index i.

If neither the row nor column exist the result is undefined.

For example, suppose the file.csv file contains the following contents:

    0x400100,1,"Monday",0xaaa
    0x400105,2,"Tuesday",0xbbb
    0x40010a,3,"Wednesday",0xccc

When matching the instruction at address 0x400105, we have that (file[0] == 0x400105), (file[1] == 2), (file[2] == "Tuesday"), etc. As seen by this example, CSV files can be used to store both integer and string values.

2.6 Examples

(true): match every instruction.
(false): do not match any instruction.
(asm == /jmp.*%r.*/): match all instructions whose assembly representation matches the regular expression jmp.*%r.* (will match jump instructions that access a register).
(mnemonic == "jmp"): match all instructions whose mnemonic is jmp.
(addr == 0x4234a7): match the instruction at the virtual address 0x4234a7.
(addr >= 0x4234a7 and addr <= 0x4514b4): match all instructions in the virtual address range 0x4234a7..0x4514b4
(op.size > 1): match all instructions with more than one operand.
(reg.size == 2): match all instructions with exactly two register operands.
(op[0] == 0x1234): match all instructions where the first operand is the immediate value 0x1234.
(op[0] == rax): match all instructions where the first operand is the %rax register.
(op[0].type == mem): match all instructions where the first operand is a memory operand.
(reg[0] == rax and reg[1] == rbx): match all instructions where the first and second register operands are %rax and %rbx respectively.
(mem[0].base == rax and mem[0].index == rbx): match all instructions with a memory operand with %rax as the base and %rbx as the index.
(mem[0].base == nil): match all instructions with a memory operand that does not use a base register.
(op[0] == op[1]): match all instructions where the first two operands are the same.
(rflags in reads): match all instructions that read the flags register.
(rflags in writes): match all instructions that modify the flags register.
(not rflags in regs): match all instructions that do not access the flags register.
defined(mem[0]): match all instructions that have at least one memory operand.
(call and target == &malloc): match all direct calls to malloc().
({rax, rdx} in writes): match all instructions that write to registers %rax and %rdx.
(op[0] == mem64<0x200(%rsp,%rax,8)>): match all instructions with the corresponding memory operand.

2.7 Exclusions

Exclusions are an additional method for controlling which instructions are patched. An exclusion is specified by the (--exclude RANGE) or (or -E RANGE) command line option, where RANGE specifies a range of addresses that should not be disassembled or rewritten. Exclusions are more low-level than the matching language since the RANGE will not even be disassembled. This can help solve some problems, such as the binary storing data inside the .text section.

The general syntax for RANGE is:

    RANGE ::=   ADDR [ .. ADDR ]
    ADDR  ::=   VALUE [ + INTEGER ]
    VALUE ::=   INTEGER
              | SYMBOL
              | SECTION [ . ( start | end ) ]

For example:

0x12345..0x45689: exclude a specific address range
.text..ChromeMain: exclude the .text section up to the symbol ChromeMain
.plt .. .text: exclude a range of sections
.plt.start .. .text.end: equivalent to the above
.plt .. .text.start: exclude all sections between .plt and the starting address of .text. The .text section itself will not be excluded.
malloc .. malloc+16: exclude the 16-byte PLT entry for malloc.
.text: exclude the entire .text section.

Note that a RANGE may include a lower and upper bound, i.e., LB .. UB. If the UB is omitted, then UB=LB is implied. The instruction at the address UB is not excluded, and disassembly will resume from this address. In other words, the syntax LB .. UB represents the address range [LB..UB), and E9Tool assumes that UB points to a valid instruction from which disassembly can resume.

3. Patching Language

The patch language specifies how to patch matching instructions from the input binary. Patches are specified using the (--patch PATCH) or (-P PATCH) command-line option, and must be paired with one or more matchings. The basic form of a patch (PATCH) uses the following high-level grammar:

    PATCH      ::= [ POSITION ] TRAMPOLINE
    POSITION   ::=   before
                   | replace
                   | after
    TRAMPOLINE ::=   empty
                   | break
                   | trap
                   | exit(CODE)
                   | signal(SIG)
                   | print
                   | CALL
                   | if CALL break
                   | if CALL goto
                   | plugin(NAME).patch()

A patch is an optional position followed by a trampoline. The trampoline represents code that will be executed when control-flow reaches the matching instruction. The trampoline can be either a builtin trampoline, a call trampoline, or a trampoline defined by a plugin.

3.1 Builtin Trampolines

The builtin trampolines include:

Patch	Description
`empty`	The empty trampoline
`break`	Immediately return from trampoline
`trap`	Execute a TRAP (`int3`) instruction
`exit(CODE)`	Exit with `CODE`
`signal(SIG)`	Raise signal `SIG`
`print`	Printing the matching instruction

Here:

empty is the empty trampoline with no instructions. Control-flow is still redirected to/from empty trampolines, and this can be used to establish a baseline for benchmarking.
break immediately returns from the trampoline back to the main program.
trap executes a single TRAP (int3) instruction.
exit(CODE) will immediately exit from the program with status CODE.
signal(SIG) will raise signal SIG in the current thread (equivalent to kill(gettid(), SIG)).
print will print the assembly representation of the matching instruction to stderr. This can be used for testing and debugging.

3.2 Call Trampolines

A call trampoline calls a user-defined function that can be implemented in a high-level programming language such as C or C++. Call trampolines are the main way of implementing custom patches using E9Tool. The syntax for a call trampoline is as follows:

    CALL ::= FUNCTION [ ABI ] ARGS @ BINARY
    ABI  ::= < clean | naked >
    ARGS ::= ( ARG , ... )

The call trampoline specifies that the trampoline should call function FUNCTION from the binary BINARY with the arguments ARGS.

To use a call trampoline:

Implement the desired patch as a function using the C or C++ programming language.
Compile the patch program using the special e9compile.sh script to generate a patch binary.
Use an E9Tool to call the patch function from the patch binary at the desired locations.

E9Tool will handle all of the low-level details, such as loading the patch binary into memory, passing the arguments to the function, and saving/restoring the CPU state.

For example, the following code defines a function that increments a counter. Once the counter exceeds some predefined maximum value, the function will execute the int3 instruction, causing SIGTRAP to be sent to the program.

    static unsigned long counter = 0;
    static unsigned long max = 100000;
    void entry(void)
    {
        counter++;
        if (counter >= max)
            asm volatile ("int3");
    }

Once defined, the program can be compiled using the e9compile.sh script.

    ./e9compile.sh counter.c

The e9compile.sh script is a gcc wrapper that ensures the generated binary is compatible with E9Tool. In this case, the script will generate a counter binary if compilation is successful.

Finally, the counter binary can be used as a call trampoline. For example, to generate a SIGTRAP after the 10000th xor instruction:

    ./e9tool -M 'mnemonic=="xor"' -P 'entry()@counter' ...

Call trampolines are primarily designed for ease-of-use and not for speed. For applications where speed is essential, it is recommended to design a custom trampoline using a plugin.

3.2.1 Call Trampoline Arguments

Call trampolines also support passing arguments to the called function. The syntax uses the C-style round brackets. For example:

    ./e9tool -M ... -P 'func(rip)@example' xterm

This specifies that the current value of the instruction pointer %rip should be passed as the first argument to the function func(). The called function can use this argument, e.g.:

    void func(const void *rip)
    {
        ...
    }

Call trampolines support up to eight arguments. The following arguments are supported:

Argument	Type	Description
Integer	`intptr_t`	An integer constant
String	`const char *`	A string constant
`&`Name	`const void *`	The runtime address of the named section/symbol/PLT/GOT entry
`(static)&`Name	`const void *`	The ELF address of the named section/symbol/PLT/GOT entry
`NULL`	`std::nullptr_t`	The `NULL` pointer
`asm`	`const char *`	Assembly representation of the matching instruction
`asm.size`	`size_t`	The number of bytes in `asm` (including the nul character)
`asm.len`	`size_t`	The string length of `asm` (excluding the nul character)
`base`	`const void *`	The runtime base address of the binary
`config`	`const void *`	A pointer to the E9Patch configuration (see `e9loader.h`)
`addr`	`const void *`	The runtime address of the matching instruction
`(static)addr`	`const void *`	The ELF address of the matching instruction
`id`	`intptr_t`	A unique identifier (one per patch)
`bytes`	`const uint8_t *`	The machine-code bytes of the matching instruction
`rex`	`int8_t`	The value of the REX prefix
`modrm`	`int8_t`	The value of the MODRM byte
`sib`	`int8_t`	The value of the SIB byte
`disp8`	`int8_t`	The value of the 8-bit displacement
`disp32`	`int32_t`	The value of the 32-bit displacement
`imm8`	`int8_t`	The value of the 8-bit immediate
`imm32`	`int32_t`	The value of the 32-bit immediate
`next`	`const void *`	The runtime address of the next executed instruction
`(static)next`	`const void *`	The ELF address of the next executed instruction
`offset`	`off_t`	The ELF file offset of the matching instruction
`target`	`const void *`	The runtime address of the jump/call/return target, else `NULL`
`(static)target`	`const void *`	The ELF address of the jump/call/return target, else `NULL`
`trampoline`	`const void *`	The runtime address of the trampoline
`random`	`intptr_t`	A (statically generated) random integer [0..`RAND_MAX`]
`size`	`size_t`	The size of `bytes`
`state`	`void *`	A pointer to a structure containing all general purpose registers
`ah`,...,`dh`, `al`,...,`r15b`	`int8_t`	The corresponding 8bit register
`ax`,...,`r15w`	`int16_t`	The corresponding 16bit register
`eax`,...,`r15d`	`int32_t`	The corresponding 32bit register
`rax`,...,`r15`	`int64_t`	The corresponding 64bit register
`rflags`	`int64_t`	The `%rflags` register
`flags`	`int16_t`	The `%rflags` status flags with format `SF:ZF:0:AF:0:PF:1:CF:0:0:0:0:0:0:0:OF`
`rip`	`const void *`	The `%rip` register
`&ah`,...,`&dh`, `&al`,...,`&r15b`	`int8_t *`	The corresponding 8bit register (passed-by-pointer)
`&ax`,...,`&r15w`	`int16_t *`	The corresponding 16bit register (passed-by-pointer)
`&eax`,...,`&r15d`	`int32_t *`	The corresponding 32bit register (passed-by-pointer)
`&rax`,...,`&r15`	`int64_t *`	The corresponding 64bit register (passed-by-pointer)
`&rflags`	`int64_t *`	The `%rflags` register (passed-by-pointer)
`&flags`	`int16_t *`	The status flags from `%rflags` (passed-by-pointer)
`op[i]`	`int8/16/32/64_t`	The matching instruction's i^th operand
`src[i]`	`int8/16/32/64_t`	The matching instruction's i^th source operand
`dst[i]`	`int8/16/32/64_t`	The matching instruction's i^th destination operand
`imm[i]`	`int8/16/32/64_t`	The matching instruction's i^th immediate operand
`reg[i]`	`int8/16/32/64_t`	The matching instruction's i^th register operand
`mem[i]`	`int8/16/32/64_t`	The matching instruction's i^th memory operand
`&op[i]`	`(const) int8/16/32/64_t *`	The matching instruction's i^th operand (passed-by-pointer)
`&src[i]`	`(const) int8/16/32/64_t *`	The matching instruction's i^th source operand (passed-by-pointer)
`&dst[i]`	`int8/16/32/64_t *`	The matching instruction's i^th destination operand (passed-by-pointer)
`&imm[i]`	`const int8/16/32/64_t *`	The matching instruction's i^th immediate operand (passed-by-pointer)
`&reg[i]`	`(const) int8/16/32/64_t *`	The matching instruction's i^th register operand (passed-by-pointer)
`&mem[i]`	`int8/16/32/64_t *`	The matching instruction's i^th memory operand (passed-by-pointer)
`op[i].size`	`size_t`	The matching instruction's i^th operand size
`src[i].size`	`size_t`	The matching instruction's i^th source operand size
`dst[i].size`	`size_t`	The matching instruction's i^th destination operand size
`imm[i].size`	`size_t`	The matching instruction's i^th immediate operand size
`reg[i].size`	`size_t`	The matching instruction's i^th register operand size
`mem[i].size`	`size_t`	The matching instruction's i^th memory operand size
`op[i].type`	`int8_t`	The matching instruction's i^th operand type (1=immediate, 2=register, 3=memory operand)
`src[i].type`	`int8_t`	The matching instruction's i^th source operand type
`dst[i].type`	`int8_t`	The matching instruction's i^th destination operand type
`imm[i].type`	`int8_t`	The matching instruction's i^th immediate operand type
`reg[i].type`	`int8_t`	The matching instruction's i^th register operand type
`mem[i].type`	`int8_t`	The matching instruction's i^th memory operand type
`op[i].access`	`int8_t`	The matching instruction's i^th operand access (`0x80 \| PROT_READ \| PROT_WRITE`)
`src[i].access`	`int8_t`	The matching instruction's i^th source operand access
`dst[i].access`	`int8_t`	The matching instruction's i^th destination operand access
`imm[i].access`	`int8_t`	The matching instruction's i^th immediate operand access
`reg[i].access`	`int8_t`	The matching instruction's i^th register operand access
`mem[i].access`	`int8_t`	The matching instruction's i^th memory operand access
`op[i].disp`	`int32_t`	The matching instruction's i^th operand displacement
`src[i].disp`	`int32_t`	The matching instruction's i^th source operand displacement
`dst[i].disp`	`int32_t`	The matching instruction's i^th destination operand displacement
`mem[i].disp`	`int32_t`	The matching instruction's i^th memory operand displacement
`op[i].base`	`int32/64_t`	The matching instruction's i^th operand base register
`src[i].base`	`int32/64_t`	The matching instruction's i^th source operand base register
`dst[i].base`	`int32/64_t`	The matching instruction's i^th destination operand base register
`mem[i].base`	`int32/64_t`	The matching instruction's i^th memory operand base register
`&op[i].base`	`int32/64_t *`	The matching instruction's i^th operand base register (passed-by-pointer)
`&src[i].base`	`int32/64_t *`	The matching instruction's i^th source operand base register (passed-by-pointer)
`&dst[i].base`	`int32/64_t *`	The matching instruction's i^th destination operand base register (passed-by-pointer)
`&mem[i].base`	`int32/64_t *`	The matching instruction's i^th memory operand base register (passed-by-pointer)
`op[i].index`	`int32/64_t`	The matching instruction's i^th operand index register
`src[i].index`	`int32/64_t`	The matching instruction's i^th source operand index register
`dst[i].index`	`int32/64_t`	The matching instruction's i^th destination operand index register
`mem[i].index`	`int32/64_t`	The matching instruction's i^th memory operand index register
`&op[i].index`	`int32/64_t *`	The matching instruction's i^th operand index register (passed-by-pointer)
`&src[i].index`	`int32/64_t *`	The matching instruction's i^th source operand index register (passed-by-pointer)
`&dst[i].index`	`int32/64_t *`	The matching instruction's i^th destination operand index register (passed-by-pointer)
`&mem[i].index`	`int32/64_t *`	The matching instruction's i^th memory operand index register (passed-by-pointer)
`op[i].scale`	`int8_t`	The matching instruction's i^th operand scale
`src[i].scale`	`int8_t`	The matching instruction's i^th source operand scale
`dst[i].scale`	`int8_t`	The matching instruction's i^th destination operand scale
`mem[i].scale`	`int8_t`	The matching instruction's i^th memory operand scale
`mem8<MEMOP>`	`int8_t`	An explicit 8-bit `MEMOP`
`mem16<MEMOP>`	`int16_t`	An explicit 16-bit `MEMOP`
`mem32<MEMOP>`	`int32_t`	An explicit 32-bit `MEMOP`
`mem64<MEMOP>`	`int64_t`	An explicit 64-bit `MEMOP`
`&mem8<MEMOP>`	`int8_t *`	An explicit 8-bit `MEMOP` (passed-by-pointer)
`&mem16<MEMOP>`	`int16_t *`	An explicit 16-bit `MEMOP` (passed-by-pointer)
`&mem32<MEMOP>`	`int32_t *`	An explicit 32-bit `MEMOP` (passed-by-pointer)
`&mem64<MEMOP>`	`int64_t *`	An explicit 64-bit `MEMOP` (passed-by-pointer)
`BB`	`const void *`	The address of the matching instruction's basic block
`(static)BB`	`const void *`	The ELF address of the matching instruction's basic block
`BB.addr`	`const void *`	Alias for `BB`
`BB.offset`	`off_t`	The ELF file offset of the matching instruction's basic block
`BB.size`	`size_t`	The size of the matching instruction's basic block in bytes
`BB.len`	`size_t`	The number of instructions in the matching instruction's basic block
`F`	`const void *`	The address of the matching instruction's function
`(static)F`	`const void *`	The ELF address of the matching instruction's function
`F.addr`	`const void *`	Alias for `F`
`F.offset`	`off_t`	The ELF file offset of the matching instruction's function
`F.size`	`size_t`	The size of the matching instruction's function in bytes
`F.len`	`size_t`	The number of instructions in the matching instruction's function
`F.name`	`const char *`	The matching instruction's function name
`file`	`const char *`	The source filename of the matching instruction
`absname`	`const char *`	The full path of `file`
`basename`	`const char *`	The basename component of `absname`
`dirname`	`const char *`	The directory component of `absname`
`line`	`int32_t`	The source line number of the matching instruction
`NAME[i]`	`int64_t/const char *`	The corresponding value from the `NAME.csv` file

Notes:

Accessing the %rflags register directly is relatively slow operation. For performance, consider accessing status flag bits indirectly using the alternative flags argument. Note that the flags argument uses a special SF:ZF:0:AF:0:PF:1:CF:0:0:0:0:0:0:0:OF layout (which differs from the native %rflags layout).
For technical reasons, the %rip register is considered constant and cannot be modified. To implement jumps, use conditional call trampolines instead.
The state argument is a pointer to a structure containing all general-purpose registers, the status flags (using the flags layout), the stack register (%rsp) and the instruction pointer register (%rip). See STATE defined in stdlib.c for the structure layout. Except for %rip, the values in the structure can be modified, in which case the corresponding register will be updated accordingly.
The NAME[i] argument will either be an integer or a string, depending on the corresponding value type from the NAME.csv file.
Section names can be modified with a .start or .end suffix, e.g., &.text.end points to the end of the .text section.

3.2.1.1 Pass-by-pointer Arguments

Some arguments can be passed by pointer. This allows the corresponding value to be modified (provided the corresponding type is not const), making it possible to manipulate the state of the program at runtime.

For example, the consider the following simple function defined in example.c:

    void inc(int64_t *ptr)
    {
        *ptr += 1;
    }

And the following patch:

    $ e9compile.sh example.c
    $ e9tool -M ... -P 'inc(&rax)@example' xterm

This patch will increment the %rax register when the inc() function is called for each matching instruction.

Attempting to write to a const pointer is undefined behavior. Typically, this will result in a crash or the written value will be silently ignored.

The passed pointer depends on the operand type:

For immediate operands (e.g., &imm[i]), the pointer will point to a constant value stored in read-only memory.
For register operands (e.g., &reg[i]), the pointer will point to a temporary location that holds the register value.
For memory operands (e.g., &mem[i]), the pointer will be exactly the runtime pointer value calculated by the operand itself. For example, consider the instruction (mov 0x33(%rax,%rbx,2),%rcx), then the value for &mem[0] will be (0x33+%rax+2*%rbx).

Generally, it is recommended to pass memory operands by pointer rather than by value. If passed by value, the memory operand pointer will be dereferenced, which may result in a crash for instructions such as (nop) and (lea) that do not access the operand.

3.2.1.2 Polymorphic Arguments

Some arguments can have different types, depending on the instruction. For example, with:

    mov %rax,%rbx
    mov %eax,%ebx
    mov %ax,%bx
    mov %al,%bl

The corresponding types for &op[0] will be (int64_t *), (int32_t *), (int16_t *) and (int8_t *) respectively. If the function is defined in C, there is no way to know the type of the passed argument.

One solution is to implement the functions in C++ rather than C, and to use function overloading. For example, using C++, one can define:

    void func(int64_t *x) { ... }
    void func(int32_t *x) { ... }
    void func(int16_t *x) { ... }
    void func(int8_t *x)  { ... }

Next, the program can be rewritten as follows:

    $ e9compile.sh example.cpp
    $ e9tool -M ... -P 'func(&op[0])@example' xterm

E9Tool will automatically select the function instance that best matches the argument types, or generate an error if no appropriate match can be found.

3.2.1.3 Type Casts and Modifiers

Arguments can also be cast to different types using a C-style syntax:

    (TYPE)ARG

Here, TYPE is defined as follows:

    TYPE ::= [ static ] ( INT-TYPE | PTR-TYPE )
    INT-TYPE ::= int8_t | int16_t | int32_t | int64_t
    PTR-TYPE ::= [ const ] ( INT-TYPE | char | void ) *

Here, the const/int8_t/int16_t/int32_t/int64_t/char/void keywords have the usual C/C++ meanings, for example:

(int8_t): cast to 8-bit integer.
(void *): cast to void pointer.
(const char *): cast to constant string pointer.

Type casts can affect overloading resolution.

In addition to types, a special static modifier is also supported. The static modifier affects how address arguments (addr, target, next, etc.) are interpreted. By default, these address arguments will represent the dynamic (a.k.a., runtime) address, defined by the formula:

    address = ELF-address + runtime-base

The static modifier changes the interpretation to the ELF file address only:

    (static)address = ELF-address

The static address also corresponds to the value used by instruction matching.

3.2.1.4 Explicit Memory Operand Arguments

It is possible to pass explicit memory operands as arguments. This is useful for reading/writing to known memory locations, such as stack memory. The syntax is the same as the matching language, e.g., mem32<(%rax)>, mem64<0x200(%rsp,%rax,8)>, etc.

3.2.1.5 Undefined Arguments

Some arguments may be undefined, e.g., op[3] for a 2-operand instruction. In this case, the NULL pointer will be passed and the type will be std::nullptr_t. This can also be used for function overloading:

    void func(std::nullptr_t x) { ... }

3.2.2 Call Trampoline ABI

Call trampolines support two Application Binary Interfaces (ABIs).

clean saves/restores the CPU state and is compatible with C/C++
naked saves/restores registers corresponding to arguments only

The ABI can be specified inside angled brackets (<...>) after the function name, e.g.:

    $ e9tool -M ... -P 'func<naked>(&op[0])@example' xterm

This will call func using the naked ABI.

The clean ABI is the default, which means E9Tool will automatically generate code for saving/restoring most of the CPU state, including all caller-saved registers %rax, %rdi, %rsi, %rdx, %rcx, %r8, %r9, %r10, and %r11. Note however that the clean ABI is different from the standard System V ABI in the following ways:

The x87/MMX/SSE/AVX/AVX2/AVX512 registers are not saved.
The stack pointer %rsp is not guaranteed to be aligned to a 16-byte boundary.

These differences exist for performance reasons, since saving/restoring the extended register state is an expensive operation. The differences are generally safe provided the patch code exclusively uses general-purpose registers. Patch binaries generated by the e9compile.sh script are guaranteed to be compatible with the clean ABI.

The naked ABI specifies that the function should be called directly and to limit the saving/restoring to registers used to pass arguments. Naked calls allow for a more fine grained control and this can be used to improve performance. However, naked calls are generally incompatible with C/C++, and the function will usually need to be implemented directly in assembly. As such, the naked ABI is not recommended unless you know what you are doing.

3.2.3 Conditional Call Trampolines

Conditional call trampolines examine the return value of the called function, and change the control flow accordingly. There are two basic forms of conditional call trampolines:

if func(...) break: if the function returns a non-zero value, then immediately return from the trampoline back to the main program.
if func(...) goto: if the function returns a non-zero value interpreted as an address, then immediately jump to that address.

The first form allows for the conditional execution of the remainder of the trampoline, possibly including the matching instruction itself. For example, consider:

    $ e9tool -M 'mnemonic=="syscall"' -P 'if filter(...)@example break' ...

The patch is placed in the default before position, i.e., will be executed as instrumentation before the matching instruction. If the filter(...) function returns a non-zero value, the trampoline will immediately return, without executing the matching instruction.

The second form allows for arbitrary jumps to be implemented. The (if func(...) goto) syntax can be thought of as shorthand for:

    if (addr = func(...)) { goto addr; }

The goto is only executed if the return value of the func is non-NULL.

3.2.4 Call Trampoline Standard Library

The main limitation of call trampolines is that the patch code cannot use standard libraries directly, including glibc. This is because the instrumentation binary is directly injected into the rewritten binary rather than dynamically/statically linked.

A parallel implementation of common libc functions is provided by the stdlib.c file. To use, simply include this file into the instrumentation code:

    #include "stdlib.c"

This version of libc is designed to be compatible with patch code. However, only a subset of libc is implemented, so it is WYSIWYG. That said, many common libc functions, including file I/O and memory allocation, have been implemented.

Unlike glibc the parallel libc is designed to be compatible with the clean ABI and handle problems, such as deadlocks, more gracefully.

3.2.5 Call Trampoline Initialization and Finalization

It is possible to define an initialization function in the instrumentation code. For example:

    #include "stdlib.c"

    static int max = 1000;

    void init(int argc, char **argv, char **envp)
    {
        environ = envp;     // Init getenv()

        const char *MAX = getenv("MAX");
        if (MAX != NULL)
            max = atoi(MAX);
    }

The initialization function must be named init, and will be called once during the patched program's initialization. For patched executables, the command line arguments (argc and argv) and the environment pointer (envp) will be passed as arguments to the function.

In the example above, the initialization function searches for an environment variable MAX, and sets the max counter accordingly.

For dynamically linked binaries, it is also possible to define a finalization function that will be called during normal program exit. For example:

    #include "stdlib.h"

    void fini(void)
    {
        fflush(stdout);
    }

The finalization function must be named fini and takes no arguments. Note that the finalization function will not be called if the program exits abnormally, such as a signal (SIGSEGV) or if the program calls "fast" exit (_exit()).

The loader also supports low-level hooks that are called before (preinit) and after (postinit) initialization. These need to be defined in special .preinit and .postinit sections, e.g.:

    asm
    (
        ".section .preinit, \"ax\"\n"
        "int3\n"
        "retq\n"
    );

If it exists, the loader will call this section before any other initialization occurs, causing the trap (int3) instruction to be executed. The preinit/postinit routines are very low-level, with no support for linkage to stdlib or other symbols.

3.2.6 Call Trampoline Dynamic Loading

The parallel libc also provides an optional implementation of the standard dynamic linker functions dlopen(), dlsym(), and dlclose(). These can be used to dynamically load shared objects at runtime, or access existing shared libraries that are already dynamically linked into the original program. To enable, define the LIBDL macro before including stdlib.c.

    #define LIBDL
    #include "stdlib.c"

The dlinit(dynamic) function must also be called in the init() routine, where dynamic is a secret fourth argument to the init() function:

    void init(int argc, char **argv, char **envp, void *dynamic)
    {
        int result = dlinit(dynamic);
        ...
    }

Once initialized, the dlopen(), dlsym(), and dlclose() functions can be used similarly to the standard libdl counterparts.

Note that function pointers returned by dlsym() should not be called directly unless you know what you are doing. This is because most libraries are compiled with the System V ABI, which is incompatible with the clean call ABI used by the instrumentation. To avoid ABI incompatibility, the external library code should be called using a special wrapper function dlcall():

    intptr_t dlcall(void *func, arg1, arg2, ...);

The dlcall() function will:

Align/restore the stack pointer to 16bytes, as required by the System V ABI.
Save/restore the extended register state, including %xmm0, etc.
Save/restore the glibc version of errno.

Be aware that the dynamic loading API has several caveats:

The dlopen(), dlsym(), and dlclose() are wrappers for the glibc versions of these functions (__libc_dlopen, etc.). The glibc versions do not officially exist, so this functionality may change at any time. Also the glibc versions lack some features, such as RTLD_NEXT, that are available with the standard libdl versions.
Since glibc is required, the original binary must be dynamically linked.
Many external library functions are not designed to be reentrant, and this may cause deadlocks if a signal occurs when the signal handler is also instrumented.
The dlcall() function supports a maximum of 16 arguments.
The dlcall() function is relatively slow, so ought to be used sparingly.

3.3 Plugin Trampolines

By design, call trampolines are very simple to use, but this also comes at the cost of efficiency. The problem is that call trampolines add an extra layer of indirection, namely, the control-flow will transfer from the main program, to the trampoline, and then to the called function. For optimal results, it is sometimes better to inline the functionality directly into the trampoline and avoid the extra level of indirection.

A very fine-grained control over the generated trampolines is possible using plugin trampolines, which allows for the precise content of trampolines to be specified directly. The downside is that low-level details, such as the saving/restoring of CPU state, must be handled manually by the trampoline code, so this method is generally only recommended for expert users only.

For more information, please see the E9Patch Programmer's Guide.

3.4 Composing Trampolines

Depending on the --match/-M and --patch/-P options, more than one patch may match a given instruction. If this occurs, then all matching trampolines will be executed in an order determined by:

The explicit (or implicit) patch position annotation, then
The command-line order for tie-breaking.

The possible values for the patch position annotation are:

before: The trampoline will be executed before the matching instruction. That is, the trampoline is instrumentation.
replace: The trampoline replaces the matching instruction.
after: The trampoline is executed after the matching instruction.

If unspecified, the default patch position is assumed to be "before", meaning that the trampoline will be executed before the matching instruction (i.e., instrumentation).

Conceptually, the individual trampolines will be arranged into a "meta" trampoline that will be executed in place of the original matching instruction. The meta trampoline has the following basic form:

        BEFORE (instruction | REPLACE) AFTER break

Here BEFORE are all before trampolines in command-line order, instruction is the original matching instruction, REPLACE is the replacement trampoline, AFTER are all after trampolines in command-line order, and break returns control-flow back to the main program.

Notes:

There can be at most one replacement trampoline. If no replacement trampoline is specified, E9Tool will execute the original matching instruction.
For the after position, the trampoline will not be executed if the matching instruction transfers control flow (i.e., for jumps taken, calls or returns).
Similarly, if any component trampoline transfers control flow (via a break or goto), the rest of the "meta" trampoline will not be executed.

For example, consider the command:

    e9tool -M 'asm=/xor.*/' -P 'after trap' -P 'replace f(...)@bin' -P print -P 'before if g(...)@bin goto' ...

Then the following "meta" trampoline will be executed in place of each xor instruction:

    print; if g(...)@bin goto; f(...)@bin; trap; break;

The print trampoline is implicitly in the before position, so is executed first. Next, the conditional call (if g(...) goto), also in the before position, will be executed. This conditional call will transfer control-flow if the g(...) function returns a non-NULL value, in which case the rest of the meta trampoline will not be executed. Otherwise, the call f(...)@bin trampoline will be executed next, which replaces the original matching xor instruction. Finally, the trap trampoline, in the after position, will be executed last.

This design makes it possible to compose instrumentation schemas. For example, one could compose AFL fuzzing instrumentation with another instrumentation for detecting memory errors.