Lua VM Roadmap

June 16, 2026 · View on GitHub

This is the strategic overview. For per-PR detail, see .agents/plans/.

Status: 2026-06-15

Version: 1.0.0-rc.3 shipped (rc.0–rc.3 all released).
Lua 5.3 official suite: 17/29 files passing. 8 fully clean (api, bitwise, code, nextvar, simple_test, tpack, utf8, vararg) plus 9 passing with documented skip-ranges (all, calls, constructs, events, gc, goto, literals, locals, pm). 4 deliberate non-goals (main, files, attrib, verybig). 8 still whole-file-skipped pending triage (big, closure, coroutine, db, errors, math, sort, strings).
Current focus: closing the 1.0.0 release gates in A13 — the suite gate (aim 20/29 with 9 documented exclusions; 17 pass today, three triage targets remain: strings/sort/math) and a one-time perf-gate check (fib already at 1.03–1.11× Luerl after #324/#360).

Release sequencing (decided 2026-06-15)

The 1.0.0 milestone is deliberately lean: its sole job is to freeze the public API around the completed Luerl → Elixir-native rewrite. We do not add new public surface in the release that freezes the surface.

1.0.0 — stabilize the rewrite. Suite + perf + docs gates only. The rewrite already meets or beats Luerl where it counts (native VM, structured errors, parse_structured/1, :max_instructions, O(1) #t/pairs); the dimensions we trail (coroutines, full io, gc/weak-tables) are documented deliberate non-goals, not gaps. Four milestone issues were already satisfied by rc.3 and closed (#77, #89, #92, and #87, the latter moot post-Luerl).
1.1.0 — additive DX + virtualization plumbing on the frozen API: the Lua.Encoder protocol + deflua auto-marshalling (#341), and the VFS plumbing that routes os/require IO through an in-memory virtual filesystem (#297, PR #302) with the deny-list unchanged. All additive → minor bump.
2.0.0 — full virtualization: flip the sandbox default so os/io operate against the VFS and nothing reaches the host. This changes observable behaviour (today's sandbox refusals become successes), so it is an honest major bump, built on 1.1's plumbing as a config-flip rather than a rewrite.

Rationale: bundling a brand-new public protocol (Lua.Encoder) into the API freeze would commit us to it before it is battle-tested; de-sandboxing is a major-version event by nature, so doing it as 2.0 soon after 1.0 is SemVer working as intended, not a delay.

Done

The new Elixir-native VM (replacing Luerl) is built up through:

Foundation (Phases 0–10): lexer, parser, codegen, register-based executor, value encoding/decoding, public Lua.* API integration. Luerl removed.
Phase 11: Compiler fundamentals — multi-assign, break, goto/label, Statement.Do, LocalFunc.
Phase 12: Full metamethod dispatch (__index, __newindex, __call, arithmetic/comparison/length/concat/tostring metamethods).
Phase 13: String pattern engine (string.find/match/gmatch/gsub).
Phase 14a: Bitwise correctness, math return types.
Phase 15: debug library, module registration polish.
Phase 16: string.format width/precision support.
Phase 17: Vararg expansion, scope fixes, _G, _ENV, hex floats, multi-return.
Performance baseline: benchmark harness vs Luerl and C Lua (PR #143).
Performance wins on main:
- Right-size register tuple allocations (PR #153).
- O(N²) → O(N) upvalue collection in closure handler (PR #154).
- O(1) upvalue access by storing upvalues as a tuple (PR #155).
- Fully tail-recursive CPS executor with line tracking off heap (PR #156).
- Fast-path executor dispatch (numeric arith, comparisons, string concat, get_field / set_field) (PR #223).
- In-range fast path for Numeric.to_signed_int64/1 (B8, PR #227). -3% on fib(30).
- Bench harness: quick mode + multi-n inputs via LUA_BENCH_MODE (PR #230). 17 min → 80 s for the full suite in quick mode; full mode preserved for publishable numbers.

In flight: Direction A — Suite Triage (milestone `1.0.0`)

Goal: push the official Lua 5.3 test suite from 6/29 to a healthier pass rate, without regressing unit tests, then cut 1.0.0-rc.1.

The version jump from 0.4 to 1.0 reflects the magnitude of the VM rewrite (Luerl is no longer a runtime dependency) and a commitment to public API stability. Cutting an rc first leaves room to catch regressions before locking 1.0 final.

Per-PR plans live in .agents/plans/A*.md. Issues track them under the 0.5.0 milestone.

High-leverage fixes (one bug → many files)

~~A0: 64-bit integer overflow wrapping for arithmetic and bitwise ops (Lua 5.3 §3.4.1; deliberate divergence from Luerl bignum semantics).~~ Shipped in PR #177.
~~A1: Empty/missing-key table reads return nil (unblocks ~6 files).~~ In review in PR #179.
~~A2: Long-string [[ … ]] lexer handles embedded ] and level brackets [==[.~~ In review in PR #180.
~~A3: Comment tokens leak past lexer in calls.lua.~~ In review in PR #182.
A4: Pre-load Lua stdlibs into package.loaded so require"io" resolves.

Per-file assertions (one PR each, ≤ ½ day)

A5–A9: bitwise, locals, nextvar, events, pm.

Investigations

A10: Timeouts in big.lua, closure.lua, utf8.lua.

Polish

A11: Clear in-source TODOs (compiler.ex:34, compiler_exception.ex:27, stdlib.ex:412). Shipped in PR #194.
A12: README and CHANGELOG for 1.0.0-rc.1.
A13: Cut 1.0.0-rc.1 (blocked on the rest).

Direction B — Performance (`1.0.x`)

Several B-direction wins landed early on (PRs #153–#156, #223). The B4–B8 sweep in May 2026 then attempted four larger architectural levers; the results are summarised here so the lessons survive the ephemeral plan files.

Shipped

B8 — Numeric narrowing fast path (PR #227). Guard-clause short circuits Numeric.to_signed_int64/1 for in-range integers. −3.3% on fib(30) chunk, no regressions. The realised win came entirely from the guard short-circuit; @compile {:inline, ...} does not cross module boundaries, so the cross-module call sites in Executor / Value still trip a function boundary.
Bench harness rework (PR #230). LUA_BENCH_MODE=quick (default) cuts the full suite from ~17 min to ~80 s; LUA_BENCH_MODE=full preserves the long windows plus a multi-n sweep ({10, 100, 1000}) for the table workloads. This harness is what surfaced B7's scale regression — the single-n measurement we had before would have hidden it.

Tried and deferred (with findings)

B6 — Eliminate per-tref Map.fetch! re-resolution. Deferred in PR #229 / #231. Post-PR #223 profile no longer supports the hypothesis: Map.get is ~3.3% on fib(22) and ~0.04% on table_build. The earlier headline number (~6.4%) was absorbed by the fast-path work in PR #223. The remaining audit cleanup is worth doing later as a refactor, not as a perf plan.
B7 — Array + hash split for Lua.VM.Table. Implemented in PR #229, closed unmerged. Wins at small n (-14% to -21% at n=100), loses badly at large n (+30% to +40% at n=1000). Memory regresses 3-5x at n=1000. The crossover is structural: BEAM tuples are immutable, so every setelement/3 on a 1024-cell tuple copies the whole tuple. PUC-Lua avoids this with in-place mutation in C; we cannot. A future plan could revisit with threshold-based promotion (stay in the data map until array_len ≥ N, then promote) — the small-n wins are real and worth preserving if the regression can be avoided.
B4 — Flat instruction stream + PC dispatch. Implemented end-to- end on a throwaway branch (all 1705 tests + 29 lua53 suite tests passed), closed unmerged (PR #233 records the findings). fib(30) regressed 3%; do_execute self-time was unchanged (50.6% vs main's 50.8%). On the BEAM, [head | rest] head-match destructures head + tail in one op while case :erlang.element(pc + 1, instrs) do is two ops (fetch + case discriminate); the hoped-for jump- table optimization did not produce a net win. The Lua.Compiler.Linearize design that the implementation used is reusable as a compile-time input to B5 without affecting the runtime executor.

What we learned

Measure against today's profile, not the plan's old profile. B6's hypothesis was already obsolete when we got to it — PR #223 had absorbed the win. Each B-plan should re-baseline before starting.
Multi-n measurement is essential for table workloads. A single n=500 data point is right on the BEAM-tuple-copy crossover for B7-style array promotion; either side of that crossover tells a completely different story. The bench harness rework was net positive for the rest of the series — without it the B7 regression at scale would have shipped.
BEAM optimisations are subtle. [head | rest] head-matching is heavily optimized and is hard to beat with case-on-tuple-element. @compile {:inline, ...} does not cross module boundaries. Refactors that should help on theoretical grounds may not on the BEAM specifically; we have to measure.
Immutable data structures bound how fast we can be. B7 hit this with setelement/3 on large tuples. The same constraint shapes what B5 can deliver — register-tuple setelement/3 is still 25% of every workload's profile and the BEAM gives us no way around that without going outside the VM (NIFs, ETS, persistent_term).

Remaining lever: B5 — Compile prototypes to Erlang functions

B5 is the architectural lever for serious throughput: translate each %Lua.Compiler.Prototype{} to an Erlang function body and call :compile.forms/2, letting the BEAM JIT (BEAMASM on OTP 25+) natively optimize the hot path. Plan stretch: fib parity with Luerl (±5%). Plan: .agents/plans/B5-compile-prototypes-to-erlang.md.

B4's deferral does not block B5: the Lua.Compiler.Linearize implementation from B4 can be reintroduced as a compile-time preparation step (feeding B5's codegen flat bytecode) without touching the runtime executor.

B5 is larger than B4 — full Erlang-AST codegen, module compile / load / purge lifecycle, fallback path for opcodes not yet translated. The plan acknowledges that landing the framework is itself a multi-month effort. Default position until a clear motivating workload appears: paused, with the implementation findings above documenting why incremental dispatch-shape work is unlikely to move the needle.

Deferred (intentional, not in 1.0)

These suite files exercise capabilities that conflict with this library's role as a sandboxed embedded Lua VM. They are tracked as deliberate non-goals, not "missing features we'd take a PR for". Per-file rationale lives alongside the @deferred_permanent map in test/lua53_suite_test.exs.

Standalone interpreter (main.lua) — shells out via os.execute, writes Lua programs to temp files, invokes lua as a subprocess. We are an embedded VM with no shell-out and no standalone interpreter.
File I/O (files.lua) — io.open, io.input, io.output, io.lines, io.read, io.write, io.tmpfile, plus os.getenv, os.remove, os.rename. io.* is a stub by design.
require semantics that need filesystem I/O (attrib.lua) — writes libs/A.lua, libs/B.lua, etc. to disk and dynamically loads them.
>64K constants harness via tmpfile + dofile (verybig.lua) — the RK/ large-constants behaviour is interesting, but the harness writes a generated program with os.tmpname()/io.output() and dofiles it. A future plan could stub these for the suite runner only.

Other deferrals in this milestone:

Coroutines (coroutine.lua) — full continuation/process model, weeks of work.
Garbage collection / weak tables (gc.lua).
Full debug library (db.lua).
C-stack tests (cstack.lua).
Backward goto and goto-out-of-conditional (3 skipped unit tests in test/lua/compiler/integration_test.exs).

Glossary

Suite — the official Lua 5.3 test files in test/lua53_tests/.
Plan — a single-PR-shaped chunk of work, lives in .agents/plans/.
Direction — strategic grouping (A = correctness/suite, B = performance).
Milestone — GitHub milestone tracking direction-scoped issues for a release.

Cadence

The agent updates the Status section above on each merged PR via the ship-a-plan skill.
The human (Dave) updates the In flight / Next / Deferred sections on Mondays or whenever strategy shifts.