VM_DESIGN.md

May 19, 2026 · View on GitHub

Overview

The VM in this prototype is register (rather than stack) based. Instructions are fixed-width, 32-bit values, with 8 bits for the opcode and 24 bits for the operand(s). Our instruction set slightly favors speed over minimalism.

Registers

Opcodes and Operands

Our internal opcode names include a verb/mnemonic, and a description of how the three operand bytes (A, B, and C) are used. This allows us to overload the same verb with different variants that use the operands in different ways, and make it easy to remember what's going on with each one.

Basic loading and data handling

MnemonicDescription
NOOPdo nothing (but do it very quickly)
LOAD_rA_rBR[A] := R[B] (C unused)
LOAD_rA_iBCR[A] := BC (16-bit signed value)
LOAD_rA_kBCR[A] := constants[BC] (load from constants table)
LOADNULL_rAR[A] := null (no constant pool lookup needed)
LOADV_rA_rB_kCR[A] := R[B], but verify that register B has name matching constants[C]
LOADC_rA_rB_kCR[A] := R[B], but verify name matches constants[C] and call if funcref
FUNCREF_iA_iBCR[A] := make_funcref(BC) (create function reference to function BC)
ASSIGN_rA_rB_kCR[A] := R[B] and name[A] := constants[C] (copy value and assign variable name)
NAME_rA_kBCname[A] := constants[BC] (assign variable name without changing value)
LIST_rA_iBCR[A] := new list with capacity BC
MAP_rA_iBCR[A] := new map with initial capacity BC
PUSH_rA_rBpush R[B] onto list R[A]
INDEX_rA_rB_rCR[A] := R[B][R[C]] (get element R[C] from list R[B])
IDXSET_rA_rB_rCR[A][R[B]] := R[C] (set element R[B] of list R[A] to R[C])
SLICE_rA_rB_rCR[A] := R[B][R[C]:R[C+1]] (slice; end index in adjacent register)
LOCALS_rAR[A] := new VarMap for local variables (r0-r4)

Math

| ADD_rA_rB_rC | R[A] := R[B] + R[C] | | SUB_rA_rB_rC | R[A] := R[B] - R[C] | | MULT_rA_rB_rC | R[A] := R[B] * R[C] | | DIV_rA_rB_rC | R[A] := R[B] / R[C] | | MOD_rA_rB_rC | R[A] := R[B] % R[C] | | POW_rA_rB_rC | R[A] := R[B] ^ R[C] (exponentiation) |

Logical (Fuzzy Logic)

MnemonicDescription
AND_rA_rB_rCR[A] := R[B] and R[C] (fuzzy: AbsClamp01(a * b))
OR_rA_rB_rCR[A] := R[B] or R[C] (fuzzy: AbsClamp01(a + b - a*b))
NOT_rA_rBR[A] := not R[B] (fuzzy: 1 - AbsClamp01(b))

Boolean Storage

MnemonicDescription
LT_rA_rB_rCR[A] := (R[B] < R[C])
LT_rA_rB_iCR[A] := (R[B] < C (8-bit signed))
LT_rA_iB_rCR[A] := (B (8-bit signed) < R[C])
LE_rA_rB_rCR[A] := (R[B] <= R[C])
LE_rA_rB_iCR[A] := (R[B] <= C (8-bit signed))
LE_rA_iB_rCR[A] := (B (8-bit signed) <= R[C])
EQ_rA_rB_rCR[A] := (R[B] == R[C])
EQ_rA_rB_iCR[A] := (R[B] == C (8-bit signed))
NE_rA_rB_rCR[A] := (R[B] != R[C])
NE_rA_rB_iCR[A] := (R[B] != C (8-bit signed))

Flow Control

MnemonicDescription
JUMP_iABCPC += ABC (24-bit signed value)
BRTRUE_rA_iBCif R[A] is true then PC += BC (16-bit signed); raises a runtime error if R[A] is an error
BRFALSE_rA_iBCif R[A] is false then PC += BC (16-bit signed); raises a runtime error if R[A] is an error
BRERR_rA_iBCif R[A] is an error then PC += BC (16-bit signed)
BRLT_rA_rB_iCif R[A] < R[B] then PC += C (8-bit signed)
BRLT_rA_iB_iCif R[A] < B then PC += C (8-bit signed)
BRLT_iA_rB_iCif A < R[B] then PC += C (8-bit signed)
BRLE_rA_rB_iCif R[A] <= R[B] then PC += C (8-bit signed)
BRLE_rA_iB_iCif R[A] <= B then PC += C (8-bit signed)
BRLE_iA_rB_iCif A <= R[B] then PC += C (8-bit signed)
IFLT_rA_rBif R[A] < R[B] is false then PC += 1
IFLT_rA_iBCif R[A] < BC is false then PC += 1
IFLT_iAB_rCif AB < R[C] is false then PC += 1
IFLE_rA_rBif R[A] <= R[B] is false then PC += 1
IFLE_rA_iBCif R[A] <= BC is false then PC += 1
IFLE_iAB_rCif AB <= R[C] is false then PC += 1
IFEQ_rA_rBif R[A] == R[B] is false then PC += 1
IFEQ_rA_iBCif R[A] == BC is false then PC += 1
IFNE_rA_rBif R[A] != R[B] is false then PC += 1
IFNE_rA_iBCif R[A] != BC is false then PC += 1
NEXT_rA_rBR[A] += 1; if R[A] < len(R[B]) then PC += 1
CALLF_iA_iBCcall funcs[BC] with parameters/return value at register A
CALLFN_iA_kBCcall function named constants[BC] with params/return at rA (DEPRECATED) — intrinsics are now callable FuncRefs resolved via LOADV + CALL
CALL_rA_rB_rCinvoke FuncRef in R[C], with stack frame at R[B], result to R[A]
RETURNreturn with result in R[0]
NEW_rA_rBR[A] := new map with __isa set to R[B]
ISA_rA_rB_rCR[A] := (R[B] isa R[C]) — true if identical or R[C] is in R[B]'s __isa chain
METHFIND_rA_rB_rCR[A] := method lookup on R[B] with key R[C], walking __isa chain; sets pendingSelf=R[B], pendingSuper=containing map's __isa
IDXGET_rA_rB_rCR[A] := R[B][R[C]] with type-map fallback, like METHFIND but never auto-invokes a funcRef result; clears pending context
SETSELF_rAOverride pendingSelf with R[A] (used for super.method() to preserve original self)
CALLIFREF_rAIf R[A] is a funcref and pending context exists, auto-invoke it with pending self/super; otherwise clear pending context
ITERGET_rA_rB_rCR[A] := element at position R[C] from container R[B]; for lists/strings same as INDEX, for maps returns {"key":k, "value":v}

(More opcodes will be added as the prototype develops.)

Assembly Language

While an official assembly language (or assembler) will probably not be part of MiniScript 2.0, it will be darned handy during development. So let's define one we can live with for the next year. To be somewhat easier to read and write than the above opcodes, we'll remove the operand usage codes, and instead use prefixes on the actual operands: "r" for register, "k" for constant, no prefix for a signed integer. We'll also allow some degree of constant and label lookup by the assembler itself. Examples:

Assembly CodeOpcode and Operands
LOAD r2, r5LOAD_rA_rB 2, 5, 0
LOAD r6, 42LOAD_rA_iBC 6, 42
LOAD r3, k20LOAD_rA_kBC 3, 20
LOAD r12, "foo"LOAD_rA_kBC 12, 7 (if k[7] == "foo")
ASSIGN r1, r2, "x"ASSIGN_rA_rB_kC 1, 2, 3 (if k[3] == "x")
NAME r0, "result"NAME_rA_kBC 0, 5 (if k[5] == "result")
ADD r5, r3, r4ADD_rA_rB_rC 5, 3, 4
ADD r5, r3, 42error (unless we add an ADD_rA_rB_iC opcode!)

This assembly representation is easier to read and write, but it does require the writer to remember which opcode modes are available, as the last example shows.

Comparison and Branching

For each comparison operator (LT: less than, LE: less than or equal, EQ: equal, NE: not equal), there are several related opcodes:

  • BR (branch) opcodes take a short (±127) jump if the comparison is true.
    • BRLT_rA_rB_iC jumps by C (8-bit signed) if R[A] < r[B]
    • BRLT_rA_iB_iC jumps by C (8-bit signed) if R[A] < B (8-bit signed)
  • IF opcodes execute the next instruction if the comparison is true; otherwise, they skip over it.
    • IFLT_rA_rB skips the next instruction unless R[A] < r[B]
    • IFLT_rA_iBC skips the next instruction unless R[A] < BC (16-bit signed)
  • plain comparison opcodes work just like the math opcodes
    • LT_rA_rB_rC computes R[B] < R[C], and stores the result in R[A]

For when we have a truth value already in a register (a common situation when compiling if statements), there are also:

  • BRTRUE_rA_iBC jumps ±32767 steps if the register value is truthy
  • BRFALSE_rA_iBC jumps ±32767 steps if the register value is falsey
  • BRERR_rA_iBC jumps ±32767 steps if the register value is an error

BRTRUE/BRFALSE deliberately raise a runtime error when given an error value, since an error used as an if/while condition should terminate the program. BRERR makes no such judgment — it simply tests for an error value without raising — which is what lets short-circuit and/or handle errors (see below).

Short-circuit and/or

The and and or operators short-circuit: the right operand is not evaluated when the left operand alone determines the result. Because an error value must not short-circuit through BRTRUE/BRFALSE (that would raise), the code generator emits a BRERR first to peel off the error case, after which the surviving BRTRUE/BRFALSE only ever sees a non-error value:

  • a and b: if a is an error, the result is that error (right operand skipped); if a is false, the result is 0 (right operand skipped); otherwise b is evaluated and combined with AND_rA_rB_rC.
  • a or b: if a is an error, b is evaluated and combined with OR_rA_rB_rC (so someErr or fallback yields fallback); if a is fully true, the result is 1 (right operand skipped); otherwise b is evaluated and combined with OR_rA_rB_rC.

A subtlety for or: short-circuiting to 1 is only valid when the left operand alone forces the fuzzy result to 1 — that is, when its fuzzy value is >= 1. A partial truth value such as 0.5 must not short-circuit, since 0.5 or x is genuinely fuzzy. The code generator tests this by negating: not a is false exactly when a is fully true, which reduces the question to a plain BRFALSE. (The and side needs no such trick: a fuzzy and is 0 exactly when the left operand is false, which BRFALSE already tests directly.)

Note that when emitting code, we often don't know whether a branch is going to be ±127 steps or less; we "back-patch" the jump later, once we've found the branch target. We can accomplish that with these opcodes by first emitting a long branch:

  IFLT_rA_rB 5, 3  # if r5 < r3 then
  JUMP_iABC 0      # jump (target TBD)

and then back-patching it as needed:

  IFLT_rA_rB 5, 3  # if r5 < r3 then
  JUMP_iABC 42     # jump to end of loop

Then, in an optimization phase, we can easily spot IF-JUMP pairs where the jump offset is small enough to replace the pair with a BRanch:

  BRLT_rA_rB_iC 5, 3, 42  # if r5 < r3 then goto end of loop
  NOOP

And the NOOP could then be optimized away, making the code shorter. (All jumps/branches that cross this point would have to be updated, but as that only makes the branch targets shorter, this can always be done.)

Note that all jump/branch targets are relative to the next instruction. So, JUMP_iABC 0 would do the same as NOOP, and JUMP_iABC -1 would put the machine into a tight infinite loop.

Function Calls

(To-Do.)