Lambda Runtime Data Management

March 6, 2026 · View on GitHub

Lambda Data Structures

Lambda runtime uses the following design/convention to represent and manage its runtime data:

  • for simple scalar types: LMD_TYPE_NULL, LMD_TYPE_BOOL, LMD_TYPE_INT
    • they are packed into Item, with high bits set to TypeId;
  • for compound scalar types: LMD_TYPE_INT64, LMD_TYPE_FLOAT, LMD_TYPE_DTIME, LMD_TYPE_DECIMAL, LMD_TYPE_SYMBOL, LMD_TYPE_STRING, LMD_TYPE_BINARY
    • they are packed into item as a tagged pointer. It's a pointer to the actual data, with high bits set to TypeId.
    • LMD_TYPE_INT64, LMD_TYPE_FLOAT, LMD_TYPE_DTIME are stored in GC nursery (bump-allocated numeric slots);
    • LMD_TYPE_DECIMAL, LMD_TYPE_SYMBOL, LMD_TYPE_STRING, LMD_TYPE_BINARY are allocated from GC heap;
  • for container types: LMD_TYPE_LIST, LMD_TYPE_RANGE, LMD_TYPE_ARRAY_INT, LMD_TYPE_ARRAY_INT64, LMD_TYPE_ARRAY_FLOAT, LMD_TYPE_ARRAY, LMD_TYPE_MAP, LMD_TYPE_VMAP, LMD_TYPE_ELEMENT
    • they are direct pointers to the container data.
    • all containers extends struct Container, that starts with field TypeId;
    • they are heap allocated and GC-managed;
  • Lambda map/LMD_TYPE_MAP, uses a packed struct:
    • its list of fields are defined as a linked list of ShapeEntry;
    • and the actual data are stored as a packed struct;
  • Lambda element/LMD_TYPE_ELEMENT, extends Lambda list/LMD_TYPE_LIST, and it's also a map/LMD_TYPE_MAP at the same time;
    • note that it can be casted as List directly, but not Map directly;
  • Lambda VMap/LMD_TYPE_VMAP, virtual map with vtable dispatch:
    • supports arbitrary key types and pluggable backends (HashMap, TreeMap, etc.);
    • type(vmap) returns "map" — transparent to Lambda scripts;
  • can use get_type_id() function to get the TypeId of an Item in a general manner;

Item Bit Layout

The Item type is a 64-bit tagged union defined in lambda.hpp. The top 8 bits (_type_id) encode the type, the lower 56 bits encode the value or pointer:

|  8-bit TypeId  |        56-bit payload        |
| bits 63..56    |        bits 55..0             |

Three categories of storage:

CategoryTypeId field56-bit payloadtype_id() method
Inline scalars (int, bool, null)_type_id > 0Value packed directlyReads _type_id
Tagged pointers (int64, float, datetime, string, symbol, decimal, binary)_type_id > 0Pointer to heap/stack dataReads _type_id
Container pointers (list, array, map, element, range, etc.)_type_id == 0Full 64-bit pointerDereferences Container::type_id

The type_id() method first checks _type_id; if zero, dereferences the pointer to read the container's embedded type ID. This is why container pointers use the full 64 bits (no tag bits stolen from the pointer).

TypeId Enum Values

ValueTypeIdCategory
0LMD_TYPE_RAW_PTRraw pointer (untagged)
1LMD_TYPE_NULLinline scalar
2LMD_TYPE_BOOLinline scalar
3LMD_TYPE_INTinline scalar (int56)
4LMD_TYPE_INT64tagged pointer (GC nursery)
5LMD_TYPE_FLOATtagged pointer (GC nursery)
6LMD_TYPE_DECIMALtagged pointer (heap)
7LMD_TYPE_NUMBERabstract type (union of int/int64/float/decimal)
8LMD_TYPE_DTIMEtagged pointer (GC nursery)
9LMD_TYPE_SYMBOLtagged pointer (heap, pooled ≤32 chars)
10LMD_TYPE_STRINGtagged pointer (heap)
11LMD_TYPE_BINARYtagged pointer (heap)
12LMD_TYPE_LISTcontainer pointer
13LMD_TYPE_RANGEcontainer pointer
14LMD_TYPE_ARRAY_INTcontainer pointer
15LMD_TYPE_ARRAY_INT64container pointer
16LMD_TYPE_ARRAY_FLOATcontainer pointer
17LMD_TYPE_ARRAYcontainer pointer (generic)
18LMD_TYPE_MAPcontainer pointer
19LMD_TYPE_VMAPcontainer pointer
20LMD_TYPE_ELEMENTcontainer pointer
21LMD_TYPE_TYPEtype meta
22LMD_TYPE_FUNCfunction pointer
23LMD_TYPE_ANYabstract (wildcard type)
24LMD_TYPE_ERRORerror sentinel

Lambda Type → C Runtime Type Mapping

When the transpiler emits unboxed (native) C code for typed variables and parameters, each Lambda type maps to a C type:

Lambda TypeTypeIdC Runtime TypeBoxingNotes
nullLMD_TYPE_NULLItempackedhigh bits = TypeId, value = 0
boolLMD_TYPE_BOOLboolpackedb2it() / it2b()
intLMD_TYPE_INTint64_tpacked (int56)i2it() / it2i(). Changed from int32_t to int64_t to support full 56-bit range without truncation
int64LMD_TYPE_INT64int64_ttagged pointerl2it() / it2l(), stored in GC nursery
floatLMD_TYPE_FLOATdoubletagged pointerd2it() / it2d(), stored in GC nursery
datetimeLMD_TYPE_DTIMEDateTimetagged pointerstored in GC nursery
decimalLMD_TYPE_DECIMALDecimal*tagged pointerheap-allocated, GC-managed
symbolLMD_TYPE_SYMBOLString*tagged pointerheap-allocated, GC-managed
stringLMD_TYPE_STRINGString*tagged pointerheap-allocated, GC-managed
binaryLMD_TYPE_BINARYString*tagged pointerheap-allocated, GC-managed
listLMD_TYPE_LISTList*direct pointercontainer, GC-managed
arrayLMD_TYPE_ARRAYArray*direct pointercontainer, GC-managed
array_intLMD_TYPE_ARRAY_INTArrayInt*direct pointercontainer, int56 elements
array_int64LMD_TYPE_ARRAY_INT64ArrayInt64*direct pointercontainer, int64 elements
array_floatLMD_TYPE_ARRAY_FLOATArrayFloat*direct pointercontainer, double elements
mapLMD_TYPE_MAPMap*direct pointercontainer, packed struct
vmapLMD_TYPE_VMAPVMap*direct pointercontainer, vtable dispatch
elementLMD_TYPE_ELEMENTElement*direct pointerextends List, also acts as Map
rangeLMD_TYPE_RANGERange*direct pointercontainer
any / untypedLMD_TYPE_ANYItemgeneric tagged value

int32 → int64 change: Lambda int was previously transpiled as C int32_t. It now transpiles as int64_t to match the 56-bit range of the i2it() packed representation. This avoids silent truncation when values exceed 32-bit range (e.g., deep_sum(10000) = 5,000,050,000).

Boxing Macros (primitive → Item)

Defined in lambda.h. Each takes a native C value and returns a uint64_t encoding the Item:

MacroSignatureSemantics
i2it(val)int64_t → uint64_tRange-checked int56: if INT56_MIN ≤ val ≤ INT56_MAX, returns ITEM_INT | (val & MASK56), else ITEM_ERROR
b2it(val)uint8_t → uint64_tIf val ≥ BOOL_ERROR(2), returns ITEM_ERROR. Otherwise (LMD_TYPE_BOOL<<56) | val
l2it(ptr)int64_t* → uint64_tTagged pointer: (LMD_TYPE_INT64<<56) | ptr. Returns ITEM_NULL if ptr is NULL
d2it(ptr)double* → uint64_tTagged pointer: (LMD_TYPE_FLOAT<<56) | ptr. Returns ITEM_NULL if NULL
s2it(ptr)String* → uint64_tTagged pointer: (LMD_TYPE_STRING<<56) | ptr. Returns ITEM_NULL if NULL
y2it(ptr)Symbol* → uint64_tTagged pointer: (LMD_TYPE_SYMBOL<<56) | ptr. Returns ITEM_NULL if NULL
k2it(ptr)DateTime* → uint64_tTagged pointer: (LMD_TYPE_DTIME<<56) | ptr. Returns ITEM_NULL if NULL
c2it(ptr)Decimal* → uint64_tTagged pointer: (LMD_TYPE_DECIMAL<<56) | ptr. Returns ITEM_NULL if NULL
x2it(ptr)Binary* → uint64_tTagged pointer: (LMD_TYPE_BINARY<<56) | ptr. Returns ITEM_NULL if NULL

Int56 range constants:

#define INT56_MAX  ((int64_t)0x007FFFFFFFFFFFFF)   // +36,028,797,018,963,967
#define INT56_MIN  ((int64_t)0xFF80000000000000LL)  // -36,028,797,018,963,968

Overflow behavior: i2it returns ITEM_ERROR if the value exceeds the 56-bit signed range.

Unboxing Functions (Item → primitive)

Defined in lambda-data.cpp. Each takes an Item and extracts the native C value:

FunctionSignatureSemantics
it2i(Item)Item → int64_tINT→get_int56(), INT64→get_int64(), FLOAT→cast, BOOL→0/1, ERROR→0
it2l(Item)Item → int64_tINT→get_int56(), INT64→get_int64(), FLOAT→cast, BOOL→0/1. Returns INT64_MAX on unrecognized type
it2d(Item)Item → doubleINT→cast via get_int56(), INT64→cast get_int64(), FLOAT→get_double(), DECIMAL→decimal_to_double(), ERROR→NAN
it2b(Item)Item → boolBOOL→bool_val, NULL/ERROR→false, INT→get_int56()!=0, FLOAT→!isnan&&!=0.0, STRING→len>0, others→true
it2s(Item)Item → String*STRING→get_string(), ERROR→static "<error>", others→nullptr

Subtle difference: it2i returns 0 on error; it2l returns INT64_MAX on unrecognized types. it2l is the preferred unboxer for INT64 contexts.

Boxing Idempotency Properties

This table documents which boxing operations are safe to apply multiple times (idempotent) and which are not. This is critical for transpiler correctness when a value might already be boxed:

TypeBoxingIdempotent?Why
STRING / SYMBOL / DECIMAL / BINARYinline OR tag on pointer✅ YesOR-ing the same tag is a no-op
INTi2it() / emit_box_int (range check + mask + OR)⚠️ Mostlymask56 strips any existing tag, then adds INT tag. But a boxed INT64 pointer value (≈2.88e17) exceeds INT56_MAX range check → returns ITEM_ERROR
INT64push_l() (allocates in GC nursery)❌ NoEach call allocates new storage; double-boxing creates a pointer-to-a-tagged-pointer
INT64push_l_safe()✅ YesChecks high byte tag first: if already boxed INT64, returns as-is; if boxed INT, extracts and re-boxes
FLOATpush_d() (allocates in GC nursery)❌ NoSame allocation issue as INT64
DTIMEpush_k() (allocates in GC nursery)❌ NoSame allocation issue as INT64
BOOLb2it() / emit_box_bool✅ YesTag is in high bits, value is in low bits

Header Files

Lambda header files defined the runtime data. They are layer one up on the other, from basic data structs, to the full runtime transpiler and runner definition.

  • lambda.h:
    • the fundamental data structures of Lambda;
    • the C version is for MIR JIT compiler;
      • thus it defines the API of Lambda runtime that is exposed to C2MIR JIT compiler;
    • the C++ version is for the manual-written/AOT-compiled Lambda runtime code;
  • lambda.hpp:
    • C++ Item struct with union members for all tagged pointer variants;
    • ConstItem for read-only access;
    • get_int56() sign-extension logic;
    • Container structs: Range, List, ArrayInt, ArrayInt64, ArrayFloat, Map, Element, VMap;
    • Error propagation guard macros: GUARD_ERROR1/2/3, GUARD_BOOL_ERROR1/2, GUARD_DATETIME_ERROR1/2/3;
  • lambda-data.hpp:
    • the full C++ definitions of the data structures and the API functions to work with the data;
    • input parsers work at this level;
  • ast.hpp:
    • the AST built from Tree-sitter syntax tree;
    • SysFuncInfo struct and sys_funcs[] table for system function registration;
    • Lambda validator, formatter works at this level;
  • transpiler.hpp:
    • the full Lambda transpiler and code runner;

GC Nursery: Numeric Value Storage

The GC nursery (defined in lib/gc_nursery.h, lib/gc_nursery.c) is a bump-allocated block chain used to store compound scalar values (int64, double, DateTime) that are too large to inline in the 56-bit Item payload. It replaces the previous num_stack implementation.

Data Structures

typedef union {
    int64_t as_long;
    double as_double;
    DateTime as_datetime;
} gc_num_value_t;                       // 8 bytes per value

typedef struct gc_nursery_block {
    gc_num_value_t *data;               // array of elements
    size_t capacity;                    // max elements in this block
    size_t used;                        // currently used elements
    struct gc_nursery_block *next;      // singly-linked
} gc_nursery_block_t;

typedef struct gc_nursery {
    gc_nursery_block_t *head;           // first block
    gc_nursery_block_t *current;        // current write block
    size_t block_size;                  // elements per block
    size_t total_allocated;             // total elements allocated
} gc_nursery_t;

Growth Strategy

When the current block is full, a new block of the same block_size capacity is allocated and linked. Default block size is GC_NURSERY_BLOCK_SIZE / sizeof(gc_num_value_t) (~4096 values).

Push Functions

Defined in lambda-mem.cpp. Each bump-allocates a slot in the GC nursery and returns a tagged Item:

FunctionSignatureSemantics
push_d(double)double → ItemAllocates in nursery via gc_nursery_alloc_double(), returns {.item = d2it(ptr)}
push_l(int64_t)int64_t → ItemAllocates in nursery via gc_nursery_alloc_long(), returns {.item = l2it(ptr)}. Returns ItemError if val == INT64_ERROR
push_l_safe(int64_t)int64_t → ItemMIR JIT workaround: checks high byte first — if already-boxed INT64, returns as-is; if boxed INT, extracts via get_int56() and re-boxes as INT64; otherwise delegates to push_l()
push_k(DateTime)DateTime → ItemChecks for DATETIME_IS_ERROR() sentinel first. Allocates in nursery via gc_nursery_alloc_datetime(), returns {.item = k2it(ptr)}

All push functions check context->nursery != NULL and return ItemError on failure.

Lifecycle

The GC nursery lives in EvalContext and persists for the duration of script execution. All nursery blocks are bulk-freed when the nursery is destroyed at context cleanup.

Two Transpiler Architectures

Lambda has two JIT compilation paths that share the same runtime functions but generate code differently:

C2MIR Transpiler (transpile.cpp)

Pipeline: AST → C source code → c2mir → MIR IR → native

  1. Generates C source code as a string (StrBuf) from the Lambda AST
  2. Feeds the C code through c2mir_compile() (C-to-MIR compiler)
  3. MIR is then JIT-compiled to native machine code via MIR_gen()

Key characteristics:

  • All runtime calls are C function calls by name (resolved at link time via import_resolver)
  • Typed arrays fully supported: checks TypeArray::nested to emit array_int(), array_int64(), array_float() or generic array()
  • Uses C statement expressions ({ ... }) extensively for expression-oriented code
  • For-loop iteration dispatches on typed arrays for unboxed element access
  • More mature, feature-complete
  • This is the default path when running ./lambda.exe script.ls
  • Generated C code can be inspected in ./temp/_transpiled*.c for debugging

MIR Direct Transpiler (transpile-mir.cpp)

Pipeline: AST → MIR IR instructions directly → native

  1. Builds MIR instructions directly using the MIR_new_insn() API
  2. Creates functions, registers, labels, and control flow in MIR IR
  3. JIT-compiled to native machine code via MIR_gen()

Key characteristics:

  • Skips C code generation entirely — more efficient compilation
  • Inline boxing operations (e.g., emit_box_int() generates range-check + tag MIR instructions instead of calling i2it)
  • Does not yet support typed arrays — always uses generic Array* for all array literals
  • Closure support with mutable capture via env struct write-back
  • Proc support with in_proc flag and multi-value return path
  • Cross-module calls for imported functions (resolves wrappers when needed)
  • Runtime functions imported via proto/import declarations resolved by the same import_resolver
  • Used with the --mir flag: ./lambda.exe --mir script.ls
  • This is now the default JIT path (no flag needed): ./lambda.exe script.ls
  • Use --c2mir flag to use the legacy C2MIR path: ./lambda.exe --c2mir script.ls

Comparison of Transpiler Approaches

AspectC2MIR (transpile.cpp)MIR Direct (transpile-mir.cpp)
Code generationGenerates C source textGenerates MIR IR instructions
Compilation steps2 (C→MIR, MIR→native)1 (MIR→native)
Typed arrays✅ Full: construction, access (array_int_get), mutation (array_int_set)❌ Always generic array()
Inline boxing❌ Calls runtime macros✅ Inline MIR instructions
Closures✅ Supported✅ Supported (with mutable capture via env write-back)
Variadic params✅ Supported✅ Supported
String patterns✅ Supported✅ Supported
Module imports✅ Full support✅ Supported (cross-module calls with wrapper resolution)
Proc support✅ Supported✅ Supported (in_proc flag, multi-value return)
Bitwise operators✅ Supported✅ Native int arg dispatch (band/bor/bxor/bnot/shl/shr)
DebuggingCheck temp/_transpiled*.cNo intermediate output

Test coverage: 113/113 tests pass (90 functional + 26 procedural + 3 chart tests, minus 6 excluded). All tests produce identical output to the C2MIR path.

System Function Dispatch

Both transpilers dispatch system function calls using the same mechanism:

  1. SysFuncInfo table (build_ast.cpp): 118 entries mapping Lambda function names to metadata (arg count, return type, C function name prefix)
  2. C function naming: fn_ prefix for pure functions, pn_ prefix for procedures. Overloaded functions append arg count: fn_min1, fn_min2
  3. Import resolution (mir.c): import_resolver() does linear scan of 306-entry func_list[] array, matching by name

SysFuncInfo Structure

typedef struct SysFuncInfo {
    SysFunc fn;                 // enum identifier (e.g., SYSFUNC_SUM)
    const char* name;           // Lambda name (e.g., "sum")
    int arg_count;              // expected args (-1 for variadic)
    Type* return_type;          // Lambda return type
    bool is_proc;               // true for side-effecting functions
    bool is_overloaded;         // true if same name with different arg counts
    bool is_method_eligible;    // true if callable as obj.method()
    TypeId first_param_type;    // type constraint on first param
    bool can_raise;             // true if may return error (T^ type)
} SysFuncInfo;

C Return Type vs Lambda Return Type

The return_type in SysFuncInfo is the Lambda-level semantic type, not the C return type. Some system functions share the same Lambda return type but differ in C:

C return typeTranspiler handlingExample functions
Item (boxed)No post-processing neededfn_sum, fn_add, fn_div, most generic functions
int64_t (native)Box with i2it()/emit_box_intfn_len, fn_count
Bool (native)Box with b2it()/emit_box_boolfn_eq, fn_lt, fn_is, fn_in
String* (native)Box with s2it()/emit_box_stringfn_strcat, fn_lower, fn_upper
double (native)Box with d2it()/emit_box_floatpn_clock
DateTime (native)Box with k2it()/emit_box_dtimefn_datetime0, fn_date0
Type* (native)Box with emit_box_typefn_type

The transpiler uses the SysFunc enum value in a switch statement to determine the actual C return type. Functions not listed in the switch default to returning Item (already boxed).

Function Parameter Handling

Parameter Count Mismatch

  • Missing arguments: automatically filled with ITEM_NULL at transpile time
  • Extra arguments: discarded with warning logged at transpile time
  • Enables optional parameter patterns: if (opt == null) "default" else opt

Type Matching

  • Argument types validated against parameter types during AST building
  • Type errors accumulate (up to 10) before stopping transpilation
  • Compatible types: intfloat (automatic coercion), ANY accepts all types

Boxing/Unboxing at Function Boundaries

Primitive ↔ Item conversions at function boundaries:

DirectionFunctionsUse Case
Boxing (primitive → Item)i2it(), l2it(), d2it(), b2it(), s2it()Return values from typed functions
Unboxing (Item → primitive)it2i(), it2l(), it2d(), it2b()Pass Item args to typed parameters

transpile_box_item: Smart Boxing in MIR Transpiler

The MIR transpiler's transpile_box_item() function is a critical gateway that decides how to box a sub-expression result into an Item. It must know whether transpile_expr() returned a native value (needs boxing) or a boxed Item (return as-is):

Sub-expression typetranspile_expr returnstranspile_box_item action
INT literalnative int64_temit_box_int() (inline range check + tag)
INT64 literalboxed Item (via emit_load_const_boxed)return as-is
FLOAT literalboxed Item (via emit_load_const_boxed)return as-is
INT + INT binarynative int64 (MIR ADD)emit_box_int()
INT / INT binarynative double (MIR DDIV)emit_box_float()
INT64 binary (any op)boxed Item (generic fallback via fn_add etc.)return as-is
Comparison (EQ, LT, etc.)native boolemit_box_bool()
System function calldepends on c_ret_tidvaries by function
Identifier / variablewhatever the variable holdsemit_box() by AST type
ANY / ERROR / NULL typeboxed Itemreturn as-is

Key challenge: for INT64 operations, transpile_expr sometimes returns raw int64_t (literals, fn_int64) and sometimes boxed Item (from generic binary fallback). The push_l_safe() function was introduced to handle this inconsistency safely.

String Memory Management

Lambda uses three distinct string allocation strategies optimized for different use cases:

1. Names (Structural Identifiers)

Function: heap_create_name(const char* str, size_t len) Pooling: Always pooled in NamePool (string interning) Use Cases:

  • Map keys
  • Element tag names
  • Element attribute names
  • Function names
  • Variable names
  • Any structural identifier that appears multiple times

Benefits:

  • Same name string always returns same pointer (identity comparison)
  • Memory sharing across entire document hierarchy
  • Inherits from parent NamePool (schemas share names with instances)

2. Symbols (Short Identifiers)

Function: heap_create_symbol(const char* str, size_t len) Pooling: Conditionally pooled (only if length ≤ 32 chars) Use Cases:

  • Symbol literals in Lambda code: 'mySymbol
  • Short identifier strings
  • Enum-like values

Benefits:

  • Common short symbols are pooled (memory sharing)
  • Long symbols fall back to arena allocation (no overhead)

Size Limit: NAME_POOL_SYMBOL_LIMIT = 32 characters

3. Strings (Content Data)

Function: heap_strcpy(const char* str, size_t len) or builder.createString() Pooling: Never pooled (arena allocated) Use Cases:

  • User content text
  • String values in documents
  • Free-form text data
  • Anything that's not a structural identifier

Benefits:

  • Fast arena allocation (no hash lookup overhead)
  • No memory overhead for unique content
  • Efficient for one-time strings

API Decision Guide

String TypeFunctionPooled?Use When
Nameheap_create_name() or builder.createName()✅ AlwaysMap keys, element tags, attribute names, identifiers
Symbolheap_create_symbol()✅ If ≤32 charsSymbol literals, short enum-like values
Stringheap_strcpy() or builder.createString()❌ NeverUser content, text data, unique values

Rule of Thumb: If it's a structural name that will appear many times, use createName(). If it's content data, use createString().

NamePool Hierarchy

NamePools support parent-child relationships for schema inheritance:

Benefits:

  • Schema definitions share names with document instances
  • No memory duplication for inherited names
  • Efficient for validation and transformation pipelines

Memory Management

Lambda Script uses automatic memory management with a garbage collector (GC) and memory pools:

GC Heap

All heap-allocated runtime objects (strings, symbols, decimals, containers, functions) are managed by the GC heap (lib/gc_heap.h, lib/gc_heap.c):

  • Each allocation is prepended with a GCHeader and linked into an intrusive singly-linked list
  • All GC-managed memory is pool-allocated via pool_alloc() for efficiency
  • At context end, gc_heap_destroy() calls pool_destroy() to bulk-free all memory
  • No manual memory management required

GC Nursery

Compound scalar values (int64, double, DateTime) are stored in the GC nursery (lib/gc_nursery.h, lib/gc_nursery.c):

  • Bump-allocated blocks for fast numeric value storage
  • All nursery memory bulk-freed at context end
  • Replaces the previous num_stack implementation

Memory Pools

  • Objects are allocated from memory pools (rpmalloc-based) for efficiency
  • Pools are automatically managed by the runtime
  • Reduces fragmentation and improves performance
  • pool_destroy() bulk-frees all pool memory at context end as a safety net

Immutability

  • Most data structures are immutable by default
  • Immutability eliminates many memory safety issues
  • Structural sharing for efficient memory usage
// Immutable collections
let list1 = (1, 2, 3);
let list2 = (0, list1...);  // Shares structure with list1

// Mutable collections (arrays)
let arr = [1, 2, 3];
// arr is mutable, but assignment creates new references

Coding Guidelines

  • Start comments in lowercase.
  • Add debug logging for development and troubleshooting.
  • Test with comprehensive nested data structures and use timeout (default: 5s) to catch hangs early
  • Back up the file before major refactoring or rewrite. Remove the backup at the end of successful refactoring or rewrite.

Debugging Transpiled Code

  • Check ./temp/_transpiled*.c for the generated C code from the last Lambda script execution
  • Useful for debugging type mismatches, boxing/unboxing issues, and function call generation
  • Shows how Lambda expressions map to C runtime calls (e.g., fn_eq(), list_push(), i2it())

MIR JIT Workarounds

INT64 Double-Boxing Problem

The core challenge in the MIR transpiler is that transpile_expr() returns inconsistent representations for INT64 values:

SourceReturnsForm
INT64 literalboxed Itemvia emit_load_const_boxed
fn_int64(x) callraw int64via POST_PROCESS_INT64 unboxing
INT64 binary (e.g., a + b)boxed Itemgeneric fallback through fn_add
System func returning INT64boxed Itemfn_sum, fn_min1, etc. return Item

When a boxed INT64 Item is passed to push_l() (which expects a raw int64), it allocates a new GC nursery entry with the tagged pointer as the "value", producing garbage.

Solution: push_l_safe() detects already-boxed Items by checking the high byte tag before allocating:

Item push_l_safe(int64_t val) {
    uint8_t tag = (uint64_t)val >> 56;
    if (tag == LMD_TYPE_INT64) return (Item){.item = (uint64_t)val};  // already boxed
    if (tag == LMD_TYPE_INT)   { /* extract int56, re-box as INT64 */ }
    return push_l(val);  // raw value, box normally
}

False positive range: raw int64 values in [2.88e17, 3.60e17] would have high byte = 4 (LMD_TYPE_INT64), causing push_l_safe to treat them as already-boxed. In practice, INT64 values in this range are rare, but this is a known limitation.

POST_PROCESS_INT64 Macro

When a system function returns a boxed Item (c_ret_tid == LMD_TYPE_ANY) but the AST type inference says the result should be INT64 (call_expr_tid == LMD_TYPE_INT64), the macro unboxes the result to a raw int64 for consistent native handling in subsequent INT64 operations:

#define POST_PROCESS_INT64(result) \
    if (c_ret_tid == LMD_TYPE_ANY && call_expr_tid == LMD_TYPE_INT64) { \
        result = emit_unbox(mt, result, LMD_TYPE_INT64); \
    }

Typed Array Gap (MIR Direct)

The C2MIR transpiler fully supports typed arrays — construction, element access, and mutation all use native typed array APIs. The MIR direct transpiler always uses generic Array* and fn_index/fn_array_set. This causes behavioral differences in runtime functions like fn_sum:

Array typefn_sum pathReturns
ArrayInt (C2MIR path)LMD_TYPE_ARRAY_INT branchpush_l(sum) → INT64
Array with INT elements (MIR path)LMD_TYPE_ARRAY branchDepends on element types

This mismatch was a source of bugs where sum([10,20,30]) returned different types depending on which transpiler was used.

Status: The C2MIR path now has full native typed array support including element access and mutation (see "Native Typed Array Access" section). Porting this to the MIR direct transpiler remains a future optimization.

Swap-Safe Store Functions

MIR's SSA optimizer (at level ≥ 2) can reorder assignments in while loops, breaking swap patterns like:

temp = a + b;  a = b;  b = temp;  // MIR may reorder these

The workaround uses external runtime store functions that MIR cannot inline or reorder:

FunctionSignatureEmitted For
_store_i64void _store_i64(int64_t* dst, int64_t val)int, int64, bool assignments in while loops
_store_f64void _store_f64(double* dst, double val)float assignments in while loops

The transpiler emits _store_i64(&_var, value) instead of _var = value when while_depth > 0 and the target is a native scalar type. Defined in lambda-data.cpp, registered in the MIR import table in mir.c.

Module Wrapper Function Pointers

When a public function in an imported module has typed parameters or a native return type, fn_call* dispatchers cannot call it directly (ABI mismatch). The transpiler generates a _w wrapper that accepts/returns Item and unboxes/boxes internally.

For cross-module calls, these wrappers must be accessible via the module's BSS struct:

  • write_mod_struct_fields() in transpile.cpp — emits _w wrapper function pointer fields in the Mod struct alongside the original function pointers
  • init_module_import() in runner.cpp — populates wrapper pointers via find_func() using the _w-suffixed name
  • needs_fn_call_wrapper() — determines which public functions need wrapper entries (typed params, or native return with no params)

BSS Global Variables (MIR Direct)

Module-level let variables in the MIR direct transpiler are stored as MIR BSS (Block Started by Symbol) items. This allows functions defined in the same module to access module-level variables:

  • A prepass (prepass_create_global_vars) scans all top-level let nodes and creates BSS items
  • load_global_var / store_global_var emit MIR load/store instructions for BSS items
  • An in_user_func flag prevents function-internal let statements from creating BSS items
  • The GlobalVarEntry struct maps variable names to their BSS items and type metadata

MIR Direct Transpiler: Implementation Issues

This section documents issues discovered while implementing the MIR direct transpiler and the solutions adopted. These represent fundamental tensions between Lambda's dynamic type system and MIR's static SSA-based IR.

Variable Type Widening and Register Type Immutability

Problem: In MIR, a register's type is fixed at declaration (e.g., MIR_T_I64 or MIR_T_D). When Lambda code widens a variable's type at runtime — such as an int variable being assigned a float value — the register type cannot change. This breaks MIR's type expectations:

// proc example: variable starts as int, gets assigned float
var n = 10        // MIR register: MIR_T_I64
n = n / 2         // int division → assigns float, but register is still int64
if n <= 1 ...     // MIR_LE on int64 register containing a double → crash

Root cause: transpile_assign_stam detects that the RHS is FLOAT but the LHS variable was declared as INT. MIR emits MIR_LE (integer less-or-equal) on what it thinks is an int64 register, but the value is actually a double bit pattern.

Solution: Loop-depth-dependent handling:

  • Inside loops (loop_depth > 0): Truncate float→int via MIR_D2I to preserve register type consistency. Loops require stable register types across iterations.
  • Outside loops: Box the value to ANY type via emit_box + MIR_MOV to a new int64 register. This preserves float precision at the cost of boxing overhead. The variable's MirVarEntry is updated: var->reg = boxed_reg; var->mir_type = MIR_T_I64; var->type_id = LMD_TYPE_ANY.

Implication: MIR register type immutability is a fundamental constraint. Any runtime type widening must either truncate (lossy) or box to ANY (indirect). This is the most architecturally impactful difference from the C transpiler, which uses C variables that can be freely reassigned.

Bitwise Function Argument Convention Mismatch

Problem: Bitwise functions (fn_band, fn_bor, fn_bnot, fn_shl, fn_shr) expect native int64_t arguments, but the MIR transpiler's generic system function dispatch path passes boxed Item values via transpile_box_item. This produced incorrect results (operations on tagged pointers instead of raw integers).

Note: fn_bxor worked by coincidence — XOR of two identically-tagged values cancels the tag bits, producing the correct result.

Solution: Added dedicated handling in transpile_call for bitwise functions (before the generic sys func dispatch). These functions use transpile_expr (native values) instead of transpile_box_item (boxed Items). If an argument's effective type is ANY (e.g., a captured variable), it is unboxed via emit_unbox before the call.

Underlying issue: The SysFuncInfo table does not distinguish between functions that take boxed Items and functions that take native C types. A NativeArgConvention field would eliminate this class of bugs (see Suggestion #8).

Closure Mutable Capture and Env Write-Back

Problem: Closures capture variables via an env struct allocated at closure creation time. When a captured variable is mutated inside the closure body, the env struct must be updated — otherwise the mutation is lost when the closure returns.

Three sub-issues were discovered:

1. Missing env write-back on assignment

After var x = new_value inside a closure, the new value was stored only in the local MIR register. The env struct still held the old value, so subsequent calls to the closure (or other closures sharing the same env) saw stale data.

Solution: Added env_offset field to MirVarEntry (-1 = not captured, ≥0 = byte offset in env struct). After each assignment to a captured variable, the transpiler emits:

boxed = emit_box(mt, val, type_id)
MIR_MOV  *(env_ptr + env_offset) = boxed

2. Boxing mismatch: typed value → ANY variable

Captured variables stored in the env struct are always boxed Item values (type ANY). When assigning a typed native value (e.g., an int64_t) to an ANY variable, the transpiler must box it first. Without this, a raw int64 was stored directly into an Item slot, producing a value with no type tag.

Solution: Added an explicit var_tid == ANY && val_tid != ANY path in transpile_assign_stam that boxes the value before the MOV.

3. Register aliasing in let bindings

let tmp = a shared the same MIR register between tmp and a. When a was subsequently mutated (a = b), tmp was also affected because both names pointed to the same register.

Solution: transpile_let_stam now copies the value to a new register via MIR_MOV (int64) or MIR_DMOV (double), ensuring each variable has its own storage.

Variable Scoping in If Branches

Problem: Variables declared inside if/else branches leaked into the outer scope, causing name collisions. For example:

let y = 100
if condition
  let y = 200    // should shadow outer y, not overwrite it
y                // should be 100, not 200

Without scope isolation, let y = 200 in the then-branch overwrote the outer y entry in the variable table, and the outer scope saw 200 after the if-statement.

Solution: transpile_if now calls push_scope(mt) before and pop_scope(mt) after each branch (both then and else). The scope stack uses a depth counter in the var table, and pop_scope removes entries added at the inner depth.

get_effective_type: Runtime vs AST Types

Problem: The AST records the declared type of each expression node, but runtime operations can change a variable's effective type (e.g., type widening, captured variable boxing). Using the AST type for code generation decisions after mutations leads to incorrect boxing/unboxing.

Example: A variable declared as int but widened to ANY after assignment still has LMD_TYPE_INT in its AST node. If the transpiler uses this to decide emit_box_int, it applies integer boxing to what is actually a boxed Item, producing garbage.

Solution: get_effective_type() checks the variable's MirVarEntry::type_id for IDENT nodes, which reflects the current runtime type after any mutations. This is the authoritative type for code generation decisions:

TypeId get_effective_type(MirTranspiler* mt, AstNode* node) {
    TypeId tid = get_type_id(node->type);  // AST-declared type
    if (node->node_type == AST_NODE_IDENT) {
        MirVarEntry* v = find_var(mt, node->str_val);
        if (v && v->type_id == LMD_TYPE_ANY) return LMD_TYPE_ANY;
    }
    return tid;
}

Proc Context Detection

Problem: Procedural scripts use pn main() with imperative statements and mutable variables. The transpiler must handle var declarations, assignment statements, and multi-statement function bodies differently from pure functional expressions.

Solution: Added in_proc flag to MirTranspiler. Detection is two-fold:

  1. transpile_func_def sets in_proc = true when processing a pn (procedure) definition
  2. transpile_content scans top-level nodes for VAR_STAM to detect implicit proc context

In proc context, transpile_content returns only the last value expression (ignoring intermediate statement results), matching the C transpiler's behavior.

Typed Array Construction

Array Type Hierarchy

Container (TypeId)
├── Array       (LMD_TYPE_ARRAY = 17)     — generic: each element is a boxed Item
├── ArrayInt    (LMD_TYPE_ARRAY_INT = 14)  — specialized: int64_t elements (int56 values stored as int64)
├── ArrayInt64  (LMD_TYPE_ARRAY_INT64 = 15) — specialized: int64_t elements (full 64-bit)
└── ArrayFloat  (LMD_TYPE_ARRAY_FLOAT = 16) — specialized: double elements

All share the same struct layout: TypeId, items*, length, extra, capacity.

Construction APIs

FunctionConstructsElement type
array()generic Array*boxed Item
array_int()ArrayInt*int64_t (int56 values)
array_int64()ArrayInt64*int64_t (full range)
array_float()ArrayFloat*double
array_fill(arr, n, v1, v2, ...)fills generic Arrayboxed Items
array_int_fill(arr, n, v1, v2, ...)fills ArrayIntraw int64 values
array_int64_fill(arr, n, v1, v2, ...)fills ArrayInt64raw int64 values
array_float_fill(arr, n, v1, v2, ...)fills ArrayFloatraw double values

Type Selection at Compile Time

The C2MIR transpiler checks TypeArray::nested->type_id at compile time to select the appropriate typed array constructor:

bool is_int_array   = nested->type_id == LMD_TYPE_INT;     // → array_int()
bool is_int64_array = nested->type_id == LMD_TYPE_INT64;   // → array_int64()
bool is_float_array = nested->type_id == LMD_TYPE_FLOAT;   // → array_float()
// otherwise → generic array()

Impact on Runtime Behavior

System functions dispatch on the runtime TypeId of arrays. Using generic Array* vs typed arrays leads to different code paths in functions like fn_sum, fn_min1, fn_max1. The typed array paths are generally simpler and more correct because elements are stored in their native C type, avoiding boxing/unboxing ambiguities.

Native Typed Array Access and Mutation

When a variable or parameter has a typed array annotation (int[], float[], int64[]), the C2MIR transpiler emits native access/mutation calls instead of generic dispatch:

AnnotationDeclarationElement ReadElement Write
int[]ArrayInt* _v = (ArrayInt*)ensure_typed_array(...)array_int_get(_v, idx)array_int_set((ArrayInt*)_v, idx, raw_int64)
float[]ArrayFloat* _v = (ArrayFloat*)ensure_typed_array(...)array_float_get(_v, idx)array_float_set((ArrayFloat*)_v, idx, raw_double)
int64[]ArrayInt64* _v = (ArrayInt64*)ensure_typed_array(...)array_int64_get(_v, idx)(generic fallback)
(none)Item _v = ...fn_index((Item)_v, boxed_idx)fn_array_set((Array*)_v, idx, boxed_val)

Performance benefit: Native setters (array_int_set, array_float_set) take raw int64_t/double values, bypassing Item boxing entirely. Native getters avoid the type dispatch overhead of fn_index.

TypeUnary Resolution

Variables declared as var bx:int[] have AST type TypeUnary (type_id=LMD_TYPE_TYPE, kind=TYPE_KIND_UNARY). The transpiler resolves this to the effective array TypeId before index operations:

// in transpile_index_expr and transpile_index_assign_stam:
if (object_type == LMD_TYPE_TYPE && type->kind == TYPE_KIND_UNARY) {
    TypeUnary* unary = (TypeUnary*)type;
    Type* operand = unary->operand;  // unwrap TypeType wrapper if present
    if (operand->type_id == LMD_TYPE_INT)   object_type = LMD_TYPE_ARRAY_INT;
    if (operand->type_id == LMD_TYPE_FLOAT) object_type = LMD_TYPE_ARRAY_FLOAT;
    // etc.
}

This enables the existing fast paths (array_int_get, array_float_get) to match.

Function Parameter Annotations

Function parameters also support typed array annotations: pn advance(bx: float[], by: float[], ...). The implementation requires:

  1. build_param_expr (build_ast.cpp): Stores the full TypeUnary in TypeParam::full_type (not just the base Type fields), preserving the operand pointer.
  2. Identifier lookup (build_ast.cpp): When referencing a parameter with TYPE_KIND_UNARY, uses full_type as the identifier's type so the transpiler can resolve element types.
  3. has_typed_params (transpile.cpp): Recognizes LMD_TYPE_TYPE + TYPE_KIND_UNARY as a typed parameter, enabling unboxed call signatures (ArrayInt* instead of Item).
  4. write_type (print.cpp): Emits ArrayInt*/ArrayFloat*/ArrayInt64* for TypeUnary parameters in function signatures.

Runtime Coercion

ensure_typed_array(Item, TypeId) converts generic Array/List to typed arrays at runtime:

  • Pass-through if already the correct typed array
  • Extracts elements via it2i()/it2d() to build a new typed array
  • Called at variable declaration and function entry for annotated parameters

Lambda Syntax Examples

// typed array variable
var bx:int[] = make_array(100, 0)
bx[i] = bx[i] + 1            // emits: array_int_set/array_int_get

// typed array function parameter
pn advance(bx: float[], by: float[], dt) {
    bx[i] = bx[i] + dt * vx[i]  // emits: array_float_set/array_float_get
}

Suggestions: Making the Runtime More Structured and Easier to Transpile

Based on implementing the MIR direct transpiler to feature-completeness (113/113 tests passing), debugging INT64 boxing issues, closure mutation, type widening, and reconciling behavior between the two transpiler paths, here are architectural improvements that would reduce friction for transpiler authors and eliminate classes of bugs.

1. Establish a Canonical Value Representation Contract

Problem: transpile_expr() in the MIR direct path returns either a raw native value (int64, double) or a boxed Item depending on the expression form. The caller must track which form it received, often incorrectly.

Suggestion: Define a clear contract for each expression's return representation:

Expression typeReturnsGuaranteed by
INT literal, INT binary opraw int64_ttranspile_expr
INT64 literal, INT64 binary opboxed Item (tagged INT64)transpile_expr
FLOAT literal, FLOAT binary opraw doubletranspile_expr
System func callItem (always boxed)fn_* functions
Variable loadmatches variable's declared typetranspile_expr

The current issue is that INT64 sometimes produces raw int64 (literals) and sometimes boxed Item (binary ops via generic fallback). Pick one and be consistent. Recommendation: always return boxed Item for INT64, since most system functions already return boxed Items, and push_l_safe exists as a safety net.

2. Data-Driven C Return Type in SysFuncInfo

Problem: The transpiler uses a hardcoded switch in transpile_box_item() to decide how to box system function return values (e.g., fn_add returns int64 when both args are INT, Item otherwise). Adding or changing a system function requires updating this switch.

Suggestion: Extend SysFuncInfo with a c_ret_type field that precisely describes the C-level return semantics:

enum CRetType {
    C_RET_ITEM,     // returns boxed Item (default, safe)
    C_RET_INT64,    // returns raw int64_t (needs emit_box_int64 to produce Item)
    C_RET_DOUBLE,   // returns raw double (needs emit_box_float to produce Item)
    C_RET_BOOL,     // returns raw int64_t 0/1 (needs emit_box_bool)
    C_RET_ADAPTIVE, // return type depends on argument types (fn_add, fn_mul, etc.)
};

For C_RET_ADAPTIVE functions, a separate per-function handler can inspect argument TypeIds and determine the actual return type. This moves the logic from ad-hoc switch cases into a structured, extensible system.

3. Typed Array Construction in MIR Direct Transpiler

Problem: The MIR direct transpiler always creates generic Array*, even when element types are known at compile time. This causes runtime behavior differences with the C2MIR path.

Status: The C2MIR path now has full native typed array support — construction (array_int()), element access (array_int_get()), and mutation (array_int_set()) — for variables and function parameters annotated with int[], float[], or int64[]. See "Native Typed Array Access and Mutation" section.

Remaining: Port the same typed array support to the MIR direct transpiler. When TypeArray::nested->type_id is known:

  • Emit array_int() / array_int_fill() for INT elements
  • Emit array_int64() / array_int64_fill() for INT64 elements
  • Emit array_float() / array_float_fill() for FLOAT elements
  • Add TypeUnary resolution in transpile_index and index assignment for native access/mutation
  • Fall back to generic array() / fn_index / fn_array_set otherwise

This eliminates behavioral divergence and enables the faster typed-array code paths in runtime functions.

4. Idempotent Boxing for All Numeric Types

Problem: push_l_safe was created as a workaround for INT64 double-boxing. The same class of bug could affect FLOAT and DATETIME boxing in the future if the MIR direct transpiler's value tracking is imprecise.

Suggestion: Create push_d_safe() and push_k_safe() analogous to push_l_safe():

Item push_d_safe(double val) {
    // Check if val's int64 reinterpretation has LMD_TYPE_FLOAT tag
    uint64_t bits;
    memcpy(&bits, &val, 8);
    uint8_t tag = bits >> 56;
    if (tag == LMD_TYPE_FLOAT) return (Item){.item = bits};  // already boxed
    return push_d(val);
}

Note: For doubles, bit patterns in the NaN space could cause false positives. An alternative is a registry-based approach where push_d records recently allocated pointers for quick lookup.

5. Uniform Runtime Function Signatures

Problem: Some runtime functions return native types (e.g., fn_len returns int64_t), while others return boxed Items. The transpiler must know each function's return convention to handle results correctly.

Suggestion: Standardize on two categories of runtime function signatures:

CategorySignatureWhen to use
Item functionsItem fn_foo(Item a, Item b, ...)Default. Safe, handles any type.
Native functionsint64_t fn_foo_i(int64_t a)Performance-critical, type-specialized.

For each function that currently returns a native type, provide a parallel _item variant that returns boxed Item. The transpiler can then always call the _item variant for safety, or call the native variant when it can prove the types match. This decouples correctness from optimization.

6. Centralized Type Narrowing Table

Problem: Type narrowing logic (e.g., "if both args to fn_add are INT, result is INT") is scattered across both transpilers in ad-hoc switch/if chains.

Suggestion: Create a centralized narrowing table:

struct TypeNarrowEntry {
    SysFunc func_id;
    TypeId  arg1_type;
    TypeId  arg2_type;
    TypeId  result_type;
    CRetType c_ret;
};

Both transpilers consult this table to determine the output type and C return convention for any system function call. Changes to narrowing rules require editing only one table, not two codebases.

7. Runtime-Validated Boxing in Debug Mode

Problem: Boxing bugs are silent — they produce incorrect values rather than crashes, making them hard to detect.

Suggestion: In debug builds, add validation assertions to boxing functions:

Item push_l_debug(int64_t val) {
    // Assert val doesn't already have a TypeId tag in the high byte
    assert((uint64_t)val >> 56 == 0 && "push_l called with already-tagged value");
    return push_l(val);
}
#ifdef DEBUG
#define push_l(val) push_l_debug(val)
#endif

This catches double-boxing at the point of occurrence rather than downstream when wrong values appear.

8. Native Argument Convention in SysFuncInfo

Problem: Most system functions accept boxed Item arguments, but some (notably bitwise functions fn_band, fn_bor, fn_bnot, fn_shl, fn_shr) expect native int64_t arguments. The transpiler has no way to distinguish these two conventions from the SysFuncInfo table, requiring hardcoded special-case handling for each such function.

Discovered via: Bitwise operators produced incorrect results because the generic dispatch path boxed arguments before passing them. fn_bxor worked by coincidence (XOR of identically-tagged values cancels the tag bits).

Suggestion: Add a c_arg_convention field to SysFuncInfo:

enum CArgConvention {
    C_ARG_ITEM,     // arguments are boxed Items (default)
    C_ARG_NATIVE,   // arguments are native C types (int64_t, double)
};

The transpiler would consult this field instead of maintaining a list of special-cased function names. This is orthogonal to c_ret_type (Suggestion #2) — a function can take native args and return a boxed Item, or vice versa.

9. First-Class Variable Type Tracking

Problem: The MIR transpiler tracks variable types in MirVarEntry::type_id, but this is separate from the AST's type annotations. After mutations (type widening, closure capture boxing), the AST type becomes stale. The get_effective_type() function bridges this gap, but it only checks for ANY — it doesn't handle all possible type changes.

Discovered via: After type widening from INT to ANY (outside loops), subsequent code still used the AST's INT type for boxing decisions, producing emit_box_int on an already-boxed Item.

Suggestion: Make variable type tracking first-class with a unified VarTypeState:

struct VarTypeState {
    TypeId declared_type;  // original AST declaration
    TypeId current_type;   // updated after each assignment
    MIR_type_t mir_type;   // MIR register type (fixed after declaration)
    bool is_captured;      // part of a closure env
    int env_offset;        // byte offset in env struct (-1 if not captured)
};

All type-dependent decisions (boxing, unboxing, comparison instruction selection) should consult current_type rather than the AST node's type. This eliminates the need for get_effective_type() as a separate concern and makes type tracking explicit.

10. Document MIR Register Type Constraints

Problem: MIR registers have a fixed type at declaration (MIR_T_I64 or MIR_T_D). This is an inherent SSA constraint, but it has far-reaching consequences for Lambda's dynamic typing that are not obvious to developers working on the transpiler for the first time.

Key constraints:

  • A variable declared as int (register type MIR_T_I64) cannot later hold a double natively
  • Loop variables must maintain stable register types across iterations (no widening inside loops)
  • Type widening outside loops requires boxing to ANY (the variable's register type changes to MIR_T_I64 holding a boxed Item)
  • The C transpiler has no equivalent constraint — C variables can be freely reassigned

Suggestion: Add a "MIR Register Type Constraints" reference section to this document (or as a comment block in transpile-mir.cpp) that explicitly lists these constraints. New code that assigns to variables should always check for type mismatches and route through the appropriate widening path. Consider adding a WIDENING_ASSERT macro:

#define WIDENING_ASSERT(var, val_tid) \
    do { if (var->type_id != LMD_TYPE_ANY && var->type_id != val_tid) \
        log_error("mir: type widening %s: %d -> %d", var_name, var->type_id, val_tid); \
    } while(0)

Summary of Priorities

PriorityImprovementImpactEffort
HighTyped arrays in MIR Direct (#3)Eliminates behavioral divergenceMedium
HighData-driven C return type (#2)Simplifies transpiler, prevents bugsMedium
HighDebug-mode boxing validation (#7)Catches bugs earlyLow
HighNative arg convention (#8)Eliminates bitwise/native arg bugsLow
MediumFirst-class variable type tracking (#9)Prevents stale type bugsMedium
MediumCanonical value representation (#1)Reduces INT64 confusionHigh (refactor)
MediumCentralized type narrowing (#6)Single source of truthMedium
MediumDocument register type constraints (#10)Prevents widening bugsLow
LowIdempotent boxing (#4)Safety netLow
LowUniform function signatures (#5)Cleaner APIHigh (many functions)