LambdaJS

June 16, 2026 · View on GitHub

Part of the LambdaJS detailed-design set. This document is the cross-cutting performance catalog. It records the optimizations that exist in the engine today (each grounded in code with a file:line anchor), summarizes the benchmark findings from the development tuning logs, and points to the sibling doc that owns each mechanism in full detail. It does not re-derive any mechanism — for the how, follow the link.

Primary sources (mechanisms): lambda/js/js_runtime_function.cpp (transient args stack), lambda/js/js_mir_expression_lowering.cpp (constant folding, native arithmetic, const-bound dispatch), lambda/js/js_mir_calls_boxing_types.cpp (boxing/native predicates), lambda/js/js_mir_function_collection_class_inference.cpp (dual-version inference), lambda/js/js_runtime.cpp (js_map_get_fast, js_get_shaped_slot, shape cache, regex cursor), lambda/lambda-data.hpp (TypeMap hash + slot_entries), lambda/js/js_mir_entrypoints_require.cpp (interpreter-vs-JIT selection), lambda/js/js_typed_array.cpp (raw bulk paths), lambda/js/js_globals.cpp (ASCII interning), lambda/sys_func_registry.c (import table). Primary sources (numbers): the vibe/jube/Transpile_Js_Tune*.md and Transpile_Js2{6,7,8}*.md development logs. Audience: engine developers. Convention: file:line references drift; confirm against the symbol name.


1. Purpose & scope

This document catalogs what is optimized and how well it measured, so a reader can find the leverage points without re-reading every lowering file. The mechanisms are verified against current code; the numbers are not.

Read every absolute figure in this document as development-time and configuration-dependent. The benchmark tables come from the tuning logs, captured at various commits on Apple Silicon hardware against a moving engine, frequently with explicit notes that per-test wall-clock carries roughly a 15% run-to-run noise floor and that machine load contaminated some runs. They are recorded here to show the shape and direction of each optimization's effect, not as a current performance guarantee. Where a log retracted or reverted a change, that is noted — several plausible optimizations measured net-negative and were removed, and that history is as load-bearing as the wins. Treat ratios ("≈50×", "−39%") as "this is the kind of effect this lever had on that workload," and re-measure on a release build before relying on any of them.

The value representation these optimizations operate on (the tagged Item, the boxed-by-default emission model, the native INT/FLOAT fast paths) is owned by JS_03 — Value Model and JS_04 — MIR Lowering; the compilation phases and interpreter/JIT selection are in JS_01 — Compilation Pipeline.


2. Optimization catalog

Grouped by tier. Each entry gives the mechanism (with a code anchor), the doc that owns it, and the measured effect where a log recorded one.

2.1 Call & dispatch tier

Optimization catalog
  • Transient call-argument stack. Call lowering reserves argument slots from a single bump stack — js_args_push / js_args_save / js_args_restore (js_runtime_function.cpp:65, :86, :92) over a fixed 256K-Item region registered with the GC exactly once (:59, :72), with a fall-back to the old per-call js_alloc_env only on pathological depth (:77). This replaced a per-call pool allocation that registered a permanent GC root range and was never freed, which made call-heavy loops O(n²) in both registration and GC marking. Owner: JS_04 §8, JS_03. Measured (Tune1): a 160k-iteration dynamic-call loop fell from 6008 ms to 12.5 ms and call-loop scaling went from quadratic to linear; an assert.sameValue ×65536 loop fell from 4154 ms to 83 ms (≈50×); a regex character-class test from 9.6 s to 0.22 s.
  • Const-bound static dispatch. The resolver in jm_transpile_call emits a direct MIR call (skipping the dynamic js_call_function path and the args buffer) not only for function f(){} declarations but also for a const-bound function expression or arrow whose call site is textually after the initializer — gated on the binding being immutable and the call being past its TDZ window. Owner: JS_04 §8, JS_05 — Functions & Closures. Measured (Tune1): a 2M-iteration 1-arg call dropped from ≈99 ms to ≈63 ms on const-bound forms (≈50 ns → ≈31 ns/call), matching the function-declaration baseline; the full test262 baseline summed elapsed fell ≈34%.
  • Direct method dispatch for known classes. obj.method(...) on a receiver whose class is known at compile time (a this inside a class method, or a variable from new ClassName()) resolves the method by walking the class/superclass chain at transpile time and emits a direct MIR_CALL, bypassing the property-fetch-then-js_call_function chain. Falls back to runtime dispatch when a subclass overrides the method. Owner: JS_07 — Classes, JS_05. Measured (Js26 "P3"): the AWFY OOP suite geometric-mean ratio versus V8 improved from 16.27× to 4.52×; permute, queens, list, and json each dropped roughly 11–41×; deeply polymorphic richards/deltablue saw little change because the override check correctly falls back.

2.2 Value & codegen tier

Value fast-path tiers
  • Constant folding + dead-branch elimination. jm_try_fold_const (js_mir_expression_lowering.cpp:1552) is a conservative compile-time evaluator over literal/unary/binary subtrees, gated by jm_const_fold_enabled (:1543, on unless LAMBDA_JS_CONST_FOLD=0); folded values are emitted at value sites through jm_emit_folded_at_value_site (:1661) and a folded if condition lets jm_transpile_if drop the dead branch. Owner: JS_04 §7. Measured (Tune3): the shift-operator straight-line cluster (S11.7.*_A4, 12 tests) fell 4.55 s → 0.63 s (−86%) in a controlled A/B, with differential fold-vs-runtime tests confirming bit-identical results.
  • Native arithmetic & dual _n versions. Numeric chains stay unboxed in native MIR_T_I64/MIR_T_D registers whenever inference proves operand and consumer are numeric — jm_is_native_type admits exactly INT/FLOAT/BOOL (js_mir_calls_boxing_types.cpp:1125), and jm_box_native (:767) crosses back to a boxed Item only at a sink. On top of this, type inference can build a separate native-signature version of a user function: the collection/inference pass keys call rewrites off fc->has_native_version + fc->native_func_item + the per-parameter fc->param_types (js_mir_function_collection_class_inference.cpp:158, :213). Owner: JS_04 §2–4, JS_05. Measured (Js26): pure-numeric benchmarks (fannkuch, pidigits, ack, nqueens, sieve) reached or beat V8; the gap is widest exactly where values cannot stay native (see §4).
  • 1-character ASCII interning. A single-byte ASCII result of str[i] / charAt is returned from a process-wide 128-entry pool g_ascii_char_pool via js_intern_ascii_char (js_globals.cpp:4356, :4394), used by the string-index fast path (js_runtime.cpp:6410), instead of allocating a fresh String. Strings are immutable so reference identity is unobservable, making interning behaviorally identical. Owner: JS_10 — Built-ins. Measured (Tune1 §7.2.B): 2M str[i] calls −14%, 2M of a hex-format helper −12%; the only piece of that round's string-alloc proposals that survived verification (the multi-operand concat fusion and an inliner widening were both reverted).

2.3 Object & property tier

  • MapKind dispatch. Both js_property_get and js_property_set guard the exotic-object handlers behind a single map_kind != MAP_KIND_PLAIN byte test (the 4-bit field in the Container header), replacing a cascade of nine/four sentinel-pointer comparisons; pool_calloc makes every ordinary object PLAIN for free. Owner: JS_06 — Objects, Properties & Prototypes §4. Measured (Js27): performance-neutral at the runtime level — the eliminated checks were cheap, well-predicted pointer compares — but it is the dispatch substrate the later iterator and inline-guard fast paths build on.
  • Constructor shape pre-allocation + fast map lookup. js_set_class_ctor_shape_metadata (js_runtime.cpp:2526) records the constructor's this.prop field names; js_constructor_create_object_shaped_cached (:2566) captures the first instance's TypeMap* into a per-call-site cache so siblings share the blueprint; js_get_shaped_slot (:2584) reads by slot index via the O(1) slot_entries[] array on TypeMap (lambda-data.hpp:255). Ordinary named lookups go through js_map_get_fast (:2765), which probes the inline FNV-1a hash table first (lambda-data.hpp:251, capacity TYPEMAP_HASH_CAPACITY == 32) and falls back to a linear ShapeEntry walk. Owner: JS_06 §10, JS_07, JS_03. Measured (Js27 §6–7): with call-site type propagation feeding the shaped-slot path, nbody fell 741 ms → 288 ms (≈2.5×); the inline shape-pointer guard (validating all field types with one compare) added only ≈0–2% on top because js_get_slot_f is already tiny.
  • Iterator fast path (MAP_KIND_ITERATOR). Synthetic array/string/typed-array iterators carry a dedicated kind tag and a fixed 2-slot data layout (source + index) instead of named properties; js_iterator_step (js_runtime.cpp:27726) dispatches on the tag and reads/advances by direct memory offset, and js_map_get_fast short-circuits the tag without dereferencing a TypeMap (:2769). This replaces ≈20 function calls per step (two hash lookups, a name allocation, and a full property set) with ≈2. Owner: JS_06 §4, JS_08 — Iterators & Generators. Measured (Js28): ≈10× fewer calls per for-of step and +56 baseline test262 passes (the optimization plus cascading fixes); the specific arguments-mutation tests it targeted turned out to be parser-bound, not iterator-bound.
  • Dense-array write & sparse-hole fill. A non-strict or strict indexed write to an existing own dense slot of a plain array short-circuits via js_array_fast_own_dense_set (js_runtime.cpp:5048) before any accessor / prototype / typed-array work — correct because an own writable data property is written directly per OrdinarySet (Tune5 §2). Gap fill on a sparse write writes the deleted-sentinel hole, not undefined (js_runtime.cpp:7199, :7323), so array methods skip holes instead of materializing a million undefined slots. Owner: JS_06 §6. Measured: Tune5 restored sieve from a ≈350× write-path regression (≈114× recovery) once the per-write snprintf + prototype walk was bypassed; Js28's hole fill cut Array.prototype.every/some and Object.isFrozen on sparse arrays from ≈3 s to single-digit milliseconds.

2.4 Runtime builtin tier

  • TypedArray raw bulk paths. Same-type bulk copy uses memmove/memcpy and cross-type conversion hoists the element-type switch out of the loop — js_typed_array_try_raw_set_same_type (js_typed_array.cpp:230), js_typed_array_raw_copy_same_type (:430), js_typed_array_raw_copy_reversed (:469), and the constructor memcpy path (:1916), gated by LAMBDA_JS_TA_RAW_FAST (:224). Detach/out-of-bounds is validated once where no user code can run between check and access; callback methods revalidate at the spec points. Owner: JS_12 — TypedArrays. Measured (Tune4 T4-P2): compliance suites are neutral (tiny arrays, runner overhead dominates), but a bulk workload over 200k-element arrays went 0.80–1.09 s → 0.16–0.17 s, and a 400k-element numeric search 30.0 s → 0.11 s — the lever targets large data movement, not the small arrays in conformance tests.
  • Regex property-walk cursor. js_regexp_test_property_all (js_runtime.cpp:15653) threads a resumable range cursor (js_regex_sorted_range_contains_cursor, :12656) through the generated-property walk for ^\p{X}+$ / ^\P{X}+$ forms; near-monotonic input advances the cursor in O(1), collapsing a per-code-point binary search to near-linear. The cursor is engaged only for the generated gc/script/scx/binary kinds (:15674); other kinds keep the flat binary search. Owner: JS_11 — RegExp. Measured (Tune3 §2.5, kept): the generated-property test cluster (439 tests) fell 61.84 s → 37.67 s (−39%) on a quiet machine, with zero flipped exit codes across 583 property tests.
  • Sys-func registry reduction. The JIT import table in sys_func_registry.c currently holds 476 js_* entries (down from a peak past 547). Tune8 removed entries that telemetry confirmed were never emitted by any lowering file and were never folded into a dispatcher (the C functions stay linked; only the {"name", FPTR(name)} rows go), plus inverse-pair folds (js_ne_rawjs_eq_raw + an inline MIR_XOR). A smaller import table means shorter import_cache probe chains during JIT compile. Owner: JS_04 §8, JS_10. Measured (Tune8): −91 entries from the telemetry pass alone moved aggregate test262 per-test wall-clock −7.24% versus the same-HEAD baseline, at 0 regressions; the win is in compile time, not run time. MIR cannot inline through native-C imports (process_inlines only inlines MIR_func_items), so a wide single dispatcher would pay its switch on every call — which is why hot entries (js_property_set, js_property_get, js_add, js_check_exception) deliberately stay direct.

Interpreter vs JIT selection

A single global, g_mir_interp_mode, selects the engine at MIR_link time — MIR_set_interp_interface builds a compact threaded-dispatch icode and emits no native code, while MIR_set_gen_interface runs MIR_gen (full codegen + register allocation) per function (js_mir_entrypoints_require.cpp:170, :730). The decisive finding from the tuning logs is that MIR_link cost is eager per-function codegen, not symbol resolution: link time tracks function count and size, and turning codegen off collapses it (Tune6 §0.2a–b). On large vendor libraries the corpus measured link at 50–82% of total compile time — lodash spent 4.8 s of its 6.7 s in link.

This drives a source-shape policy (:698): count total MIR instructions post-lowering, then select the interpreter when total_insns > JM_LARGE_MODULE_INSN_THRESHOLD (100k, any context) or, in a document context, when g_js_force_document_interp is set or total_insns > JM_RADIANT_INTERP_INSN_THRESHOLD (20k); LAMBDA_JS_LARGE_INTERP=0 overrides back to JIT. The thresholds live in js_mir_internal.hpp:23, :26. Large/cold modules interpret because the codegen they would pay for is mostly never executed — vendor JS declares thousands of functions a page never calls — so the JIT cost can never amortize. Hot compute keeps the JIT because native code pays off once amortized: a 50M-iteration loop measured ≈1.65× slower interpreted (Tune7 §1). The render/layout/view CLI commands force interpreter mode for all document JS via g_js_force_document_interp.

Two important nuances the logs pinned down. First, this must use the link-interface interpreter path (MIR_set_interp_interface with the JIT generator still initialized, g_mir_interp_mode left 0), not the pure-interpreter path that skips MIR_gen_init — the latter regressed 49 interactive UI-automation tests because paths that still need the generator (eval/batch lowering) diverged (Tune6 §0.2e). Second, the per-function lazy-JIT interface MIR ships (MIR_set_lazy_gen_interface) collapses link but makes on-demand generation ≈80× costlier per function and scales ≈O(n²) at opt≥2 under interleaved generation, so it was rejected as net-negative; the interpreter, which does no per-function codegen at all, is the usable lever. Owner: JS_01 §4, §7. Measured (Tune6 §0.2e): large libraries 4–6× faster total (lodash 6.7 s → 1.3 s); the web-template Radiant suite ≈3× faster wall and CPU, at 0 test262 regressions and a green Radiant baseline.

A correctness caveat: the MIR interpreter does not perform tail-call optimization, so TCO-dependent deep recursion that passes under JIT (test/js/tco.js) overflows the stack under the interpreter (Tune7 §2) — a known engine-dependent divergence, not a flake.


4. Benchmark results vs V8/Node (development-time)

The table below summarizes the AWFY/R7RS/JetStream-style suite comparison against V8 (Node.js) from the Js26 log, as a geometric-mean LambdaJS/V8 ratio (lower is better; <1× means LambdaJS was faster). These are development-time figures on Apple Silicon from a specific commit window; they are not a current guarantee. The progression columns show how the property-access (P1/P2/P4) and method-dispatch (P3) work moved the numbers.

SuiteOriginalAfter P1+P2+P4bAfter P3 method dispatchRepresentative wins (vs V8)
AWFY25.82×16.27×≈4.52×sieve 0.26×, permute 0.59×, queens 0.50×, list 0.89×
R7RS3.12×2.26×fannkuch 0.15×, pidigits 0.17×, ack 0.44×, tak 0.79×
BENG3.65×2.48×fannkuch, pidigits, fasta near-parity
KOSTYA6.91×≈4.9×primes 0.85×; matmul still ≈98×
LARCENY5.28×≈3.7×array1 0.31×, paraffins/primes <1×

The pattern across all suites: pure integer/float arithmetic in locals, and simple recursion, are at or below V8; the gap concentrates in object/property-heavy and float-field-in-loop code, where boxing dominates. The Tune passes used the test262 corpus rather than V8 comparison as their signal, and reported the slow tail collapsing round by round — e.g. by Tune2 the regular timing TSV had zero tests ≥2 s; by Tune7 the interpreter ran the 39,258-test baseline with 0 genuine failures and ≈−17% summed per-test time versus JIT (short scripts, codegen never amortized). Tune5 separately caught and fixed two severe regressions against an April baseline — an array-write accessor walk (sieve ≈350× slower, restored to ≈114× recovery) and an object-literal CreateDataProperty path (gcbench ≈21×, deriv ≈17×, restored via a map_put fast path) — a reminder that these numbers move in both directions as correctness work lands.


5. Open performance gaps & regressions

Grounded in the tuning logs; these are known-open or accepted-cost, not claims of a bug.

  1. Float boxing in hot loops. Boxed float reads/writes allocate in the GC nursery (jm_box_floatpush_d); V8 stores doubles inline via NaN-boxing. This is the dominant residual on float-field-in-loop benchmarks (nbody, matmul, mandelbrot, spectralnorm) — the class-based nbody variant is ≈2× faster than the object-literal one precisely because the shaped-slot path removes some boxing, but push_d allocation remains the next target (Js27 §7.11, Js26 §6b). Native multiply also routes INT×INT through doubles to match JS semantics (JS_04 §4), so a pure-integer hot loop pays I2D/DMUL/D2I rather than an integer multiply.
  2. arr.push override-check re-intern. arr.push(x) runs a per-call override check (js_property_get for "push" then a builtin compare) before dispatching, and js_property_getjs_get_prototype_of re-interns "Array"/"prototype" into the name pool every call — the profiled hot path of base64 (Tune5 §6a). The safe fix (skip the check when arr->extra == 0 and Array.prototype is pristine, with a tamper flag) and a realm-scoped intrinsic-prototype cache are both deferred; a process-global proto cache was tried and reverted because it leaked one realm's prototype to another (test262 multi-realm).
  3. Polymorphic devirtualization fallback. Direct method dispatch (§2.1) falls back to full runtime dispatch whenever a subclass overrides the target method, so deeply polymorphic hierarchies (richards, deltablue) keep the slow path; shape-based polymorphic dispatch ("P3b") is unimplemented (Js26 §P3).
  4. Conservative ADD inference loses native typing. A correctness fix made + inference conservative (a param used in x + y is no longer inferred numeric, since + is overloaded add/concat), which boxed arithmetic in additive/recursive numeric functions — ack went ≈12 ns/call → ≈208 ns/call (Tune5 §6c). The safe fix (infer ADD numeric only when both operands are provably non-string, plus fixed-point return-type inference for self-recursion) is deferred behind a 0-regression gate.
  5. Destination-passing lowering deferred. A per-opcode histogram showed emitted MIR is 66–88% data-movement MOVs (lodash 88%), from the value-returning "materialize into a temp, then MOV into the destination" style. A destination-passing rewrite (caller names the target register) could roughly halve MOVs but touches every expression-lowering path and is a deep codegen project, explicitly not scheduled (Tune6 §3.3, JS_04 "Known Issues" #1).
  6. Lazy per-function generation is non-viable. As in §3, MIR's native lazy-gen interface is ≈80× costlier per function and ≈O(n²) at opt≥2; coarse batched deferral at opt=0 is the only redesign worth revisiting, and only for compute-heavy apps that call part of their code (Tune6 §0.2b–c).

6. Compiled-artifact caching blockers

Caching compiled output across repeated compiles (the web-template suite recompiles byte-identical vendor files dozens of times — bootstrap ×116, jquery variants ×58/45/30) is attractive but currently blocked (Tune6 §3.4). After the interpreter policy (§3) removed JIT codegen from the cold path, the cacheable cost shrank: the realm-safe AST-level cache saves only ≈5–15% because MIR lowering, not parse/AST, dominates per file. The valuable slice — MIR-module reuse — is blocked because the JS→MIR lowering bakes ≈59 raw realm pointers into modules as integer constants (interned String->chars at js_mir_expression_lowering.cpp:2069, plus ctor_prop_ptrs, shape_cache_ptr, and inline-cache pointers in js_mir_calls_boxing_types.cpp). A deserialized or cross-realm-reused module would dereference stale pointers, so caching needs a de-pointered, relocatable MIR lowering first. Likewise an eval compile cache was implemented and reverted (Tune2 §3.2) because the targeted eval forms route around the cache (regex-literal fast path, or Phase-C var-declaration compiles); see JS_04 §10 for the eval tiers.


7. Benchmark suite & current JS pass rate

LambdaJS's primary performance test is the multi-suite benchmark harness under test/benchmark/. It doubles as a broad real-world correctness check: each suite is a set of standard JavaScript programs (ported from the V8/AWFY/Octane and Scheme R7RS/Larceny corpora, plus the Benchmarks-Game and kostya cross-language sets) that a conformant engine should run to completion and — where the program self-verifies — produce the correct result. The historical performance ratios versus V8 are in §4; this section records the current, as-shipped run on the release engine.

7.1 Suites and how Lambda JS runs them

Invocation is ./lambda.exe js <file> from the repo root (the engine exposes console.log, process, process.stdout.write, process.hrtime.bigint(), and performance.now(); it does NOT expose require, load, read, gc, or window, which is why the modular *2.js AWFY form and the stock Octane drivers do not run as-shipped). The master driver test/benchmark/run_benchmarks.py compares LambdaJS against Node/QuickJS/CPython and the Lambda .ls path; the per-suite run_bench.py scripts drive the non-JS engines; test/benchmark/jetstream/run_jetstream_ljs.py is the JS-on-LambdaJS JetStream runner; result snapshots live in test/benchmark/Overall_Result*.md.

SuiteWhat it isJS form run under Lambda JSPass criterion
awfyAre-We-Fast-Yet micro + macroself-contained *2_bundle.js (the *2.js form uses Node require)self-prints <Name>: PASS/FAIL from verifyResult()
r7rsScheme-derived microself-contained *2.js with a main()self-prints <name>: PASS against a hardcoded checksum
larcenyLarceny/Gambit Scheme portsplain self-contained *.jschecksum PASS/FAIL (2 print only DONE/nothing)
kostyakostya cross-language setself-contained *.jschecksum PASS/FAIL (2 print only DONE/nothing)
bengBenchmarks Gameself-contained beng/js/*.jsnone built in — compute + console.log only (pass = ran to completion)
octaneV8 Octaneeach file only registers a BenchmarkSuite; needs a synchronous driver (none ships for Lambda JS)internal throw on checksum mismatch
jetstreamJetStream subsetrun_jetstream_ljs.py strips the trailing class Benchmark {} and appends a timing loopran-to-completion (only crypto-md5 self-checks a result)

7.2 Current pass rate

The table is the initial audit (release build, 2026-06-16) updated for the three wrong-result fixes landed afterward (bounce, levenshtein, crypto-md5 — see §7.3). Those are correctness fixes in the transpiler, so they hold on any build; each was verified by the benchmark passing on JIT and interpreter and make test-lambda-baseline at 3169/3169. Timing figures should be refreshed on a fresh release build. "Verified" means the benchmark self-checked a result/checksum; "ran-only" means it completed without error but the program has no built-in result check (so it confirms "executes without error", not "verified correct").

SuitePass / totalQualityRemaining failures
awfy12 / 14verifiedhavlak, cd timeout
r7rs10 / 10verified
larceny12 / 12verified (gcbench, pnpoly ran-only)
kostya7 / 7verified (brainfuck, matmul ran-only)
beng10 / 10ran-only (no self-check)
octane2 / 6checksum-throw (custom driver)box2d wrong result; pdfjs, typescript error; earley-boyer timeout
jetstream10 / 13ran-to-completionnavier-stokes, hashmap, raytrace3d timeout
Overall63 / 72 (≈88%)box2d + 2 errors + 5 timeouts remain

Caveats: Octane needs a hand-rolled synchronous driver because no in-repo Lambda-JS Octane runner exists (under the stricter "runnable as shipped" reading, Octane is effectively 0/6); roughly a third of the passes are "ran-only", confirming execution but not result correctness; and timeouts are single-run wall-time, so a heavy macro-benchmark that is merely very slow on this machine is not distinguished from one that hangs.

7.3 Failures — fixed and open

Fixed (this session) — three wrong-result correctness bugs, all in the transpiler's optimization passes (not the runtime); each verified by the benchmark passing on JIT and interpreter plus a clean make test-lambda-baseline (3169/3169):

  • awfy/bounce (was 1321 vs 1331) — the §7 shaped-slot fast path did a raw MIR_DMOV/MIR_MOV load trusting the compile-time field type, but a FLOAT-inferred field can hold a tagged int at runtime (this.x = … % 500); fixed by reading through the type-guarded js_get_slot_f/js_get_slot_i, which coerces by the slot's runtime entry->type (js_mir_calls_boxing_types.cpp, js_mir_expression_lowering.cpp).
  • kostya/levenshtein — the P4h loop-invariant typed-array data-pointer hoist did not treat [prev,curr]=[curr,prev] as a reassignment, so it cached stale Int32Array pointers across the swap; fixed by marking destructuring-LHS targets unsafe (jm_mark_destructure_targets_unsafe, js_mir_function_collection_class_inference.cpp).
  • jetstream/crypto-md5 — the P6 single-return inliner bound each parameter before evaluating later arguments with no hygiene, so an inlined argument's free variable was shadowed by a same-named just-bound parameter; fixed by evaluating all arguments in the caller scope before pushing the inline scope (jm_transpile_inline_native, js_mir_expression_lowering.cpp).

Open — wrong result / error. octane/box2d throws Cannot read properties of null (reading 'x') (object/prototype semantics — not yet investigated). octane/pdfjs runs ~80 s then throws Promise resolver is not a function — the Promise constructor does not invoke the executor callback (see JS_09 — Async & Modules); octane/typescript runs ~83 s then throws ... is not a function (a missing builtin on the TS-compiler path).

Open — timeout (heavy/slow or hung). awfy/havlak, awfy/cd, octane/earley-boyer, jetstream/navier-stokes, jetstream/hashmap, jetstream/raytrace3d exceeded the run cutoff (45–120 s) — historically the engine's hardest macro-benchmarks, where the float-boxing and polymorphic-dispatch gaps in §5 bite hardest.


Known Issues & Future Improvements

Still-open performance work, distilled from the logs:

  1. Inline float fields without boxing — the single biggest remaining gap (§5.1). Keep shaped float fields in native registers across a loop iteration (scalar replacement of aggregates), or store doubles inline, to remove push_d nursery allocation on float-heavy loops.
  2. Pristine-prototype guard + realm-scoped intrinsic cache (§5.2) — to retire the per-call arr.push override check and the per-call name re-interning that feeds a broad ≈2× regression. Must be realm-scoped, not process-global.
  3. Shape-based polymorphic method dispatch ("P3b") (§5.3) — a bounded polymorphic inline cache so richards/deltablue stop falling back to runtime dispatch.
  4. Two-operand-non-string ADD inference + fixed-point return types (§5.4) — recover native integer typing for additive/recursive numeric functions without resurrecting the string-concat unsoundness.
  5. Destination-passing lowering (§5.5) — the structural fix for the 66–88% MOV volume; a scoped codegen-quality project, gated on full test262 + Radiant re-validation.
  6. De-pointered relocatable MIR + module cache (§6) — unblock cross-compile/cross-realm artifact reuse for the repeated-vendor-JS workload.
  7. Sys-func registry: production-only gate (Tune8 §4) — wrap the 15 test262-only fast-path emit sites so a JS_TEST262_FAST_PATHS=0 build actually links and drops them; currently registry-side only.
  8. Grow the TypeMap hash / sort the method-spec tables (JS_06 §11) — the capacity-32 hash silently stops inserting and builtin-method lookup is a linear strncmp; both degrade objects/classes with many members.

Appendix A — Source map

FileResponsibility (this doc)
lambda/js/js_runtime_function.cppTransient call-argument stack (js_args_push/_save/_restore, js_alloc_env fallback).
lambda/js/js_mir_expression_lowering.cppConstant folding (jm_try_fold_const, jm_emit_folded_at_value_site), native arithmetic, const-bound dispatch, method-dispatch devirtualization.
lambda/js/js_mir_calls_boxing_types.cppjm_is_native_type, jm_box_native, boxing/unboxing, the import_cache.
lambda/js/js_mir_function_collection_class_inference.cppDual native-version inference (has_native_version, native_func_item, param_types).
lambda/js/js_runtime.cppjs_map_get_fast, shaped-slot read/cache (js_get_shaped_slot, js_constructor_create_object_shaped_cached), MapKind dispatch, regex property cursor, 1-char ASCII fast path.
lambda/js/js_globals.cppg_ascii_char_pool / js_intern_ascii_char.
lambda/lambda-data.hppTypeMap hash table + slot_entries[], MapKind.
lambda/js/js_mir_entrypoints_require.cppInterpreter-vs-JIT selection, insn-count thresholds, link interface.
lambda/js/js_typed_array.cppTypedArray raw bulk copy/convert paths (LAMBDA_JS_TA_RAW_FAST).
lambda/sys_func_registry.cJIT import table (registry size).
lambda/js/js_mir_internal.hpp, transpile_js_mir.cppJM_LARGE_MODULE_INSN_THRESHOLD, JM_RADIANT_INTERP_INSN_THRESHOLD, opt level.