Lua VM Roadmap
June 16, 2026 · View on GitHub
This is the strategic overview. For per-PR detail, see .agents/plans/.
Status: 2026-06-15
- Version:
1.0.0-rc.3shipped (rc.0–rc.3 all released). - Lua 5.3 official suite: 17/29 files passing. 8 fully clean
(
api,bitwise,code,nextvar,simple_test,tpack,utf8,vararg) plus 9 passing with documented skip-ranges (all,calls,constructs,events,gc,goto,literals,locals,pm). 4 deliberate non-goals (main,files,attrib,verybig). 8 still whole-file-skipped pending triage (big,closure,coroutine,db,errors,math,sort,strings). - Current focus: closing the
1.0.0release gates inA13— the suite gate (aim 20/29 with 9 documented exclusions; 17 pass today, three triage targets remain:strings/sort/math) and a one-time perf-gate check (fib already at 1.03–1.11× Luerl after #324/#360).
Release sequencing (decided 2026-06-15)
The 1.0.0 milestone is deliberately lean: its sole job is to freeze the public API around the completed Luerl → Elixir-native rewrite. We do not add new public surface in the release that freezes the surface.
- 1.0.0 — stabilize the rewrite. Suite + perf + docs gates only. The
rewrite already meets or beats Luerl where it counts (native VM,
structured errors,
parse_structured/1,:max_instructions, O(1)#t/pairs); the dimensions we trail (coroutines, fullio, gc/weak-tables) are documented deliberate non-goals, not gaps. Four milestone issues were already satisfied by rc.3 and closed (#77, #89, #92, and #87, the latter moot post-Luerl). - 1.1.0 — additive DX + virtualization plumbing on the frozen API:
the
Lua.Encoderprotocol +defluaauto-marshalling (#341), and the VFS plumbing that routesos/requireIO through an in-memory virtual filesystem (#297, PR #302) with the deny-list unchanged. All additive → minor bump. - 2.0.0 — full virtualization: flip the sandbox default so
os/iooperate against the VFS and nothing reaches the host. This changes observable behaviour (today's sandbox refusals become successes), so it is an honest major bump, built on 1.1's plumbing as a config-flip rather than a rewrite.
Rationale: bundling a brand-new public protocol (Lua.Encoder) into the
API freeze would commit us to it before it is battle-tested; de-sandboxing
is a major-version event by nature, so doing it as 2.0 soon after 1.0 is
SemVer working as intended, not a delay.
Done
The new Elixir-native VM (replacing Luerl) is built up through:
- Foundation (Phases 0–10): lexer, parser, codegen, register-based executor,
value encoding/decoding, public
Lua.*API integration. Luerl removed. - Phase 11: Compiler fundamentals — multi-assign,
break,goto/label,Statement.Do,LocalFunc. - Phase 12: Full metamethod dispatch (
__index,__newindex,__call, arithmetic/comparison/length/concat/tostring metamethods). - Phase 13: String pattern engine (
string.find/match/gmatch/gsub). - Phase 14a: Bitwise correctness, math return types.
- Phase 15:
debuglibrary, module registration polish. - Phase 16:
string.formatwidth/precision support. - Phase 17: Vararg expansion, scope fixes,
_G,_ENV, hex floats, multi-return. - Performance baseline: benchmark harness vs Luerl and C Lua (PR #143).
- Performance wins on main:
- Right-size register tuple allocations (PR #153).
- O(N²) → O(N) upvalue collection in closure handler (PR #154).
- O(1) upvalue access by storing upvalues as a tuple (PR #155).
- Fully tail-recursive CPS executor with line tracking off heap (PR #156).
- Fast-path executor dispatch (numeric arith, comparisons, string
concat,
get_field/set_field) (PR #223). - In-range fast path for
Numeric.to_signed_int64/1(B8, PR #227). -3% on fib(30). - Bench harness: quick mode + multi-n inputs via
LUA_BENCH_MODE(PR #230). 17 min → 80 s for the full suite in quick mode; full mode preserved for publishable numbers.
In flight: Direction A — Suite Triage (milestone 1.0.0)
Goal: push the official Lua 5.3 test suite from 6/29 to a healthier
pass rate, without regressing unit tests, then cut 1.0.0-rc.1.
The version jump from 0.4 to 1.0 reflects the magnitude of the VM
rewrite (Luerl is no longer a runtime dependency) and a commitment to
public API stability. Cutting an rc first leaves room to catch
regressions before locking 1.0 final.
Per-PR plans live in .agents/plans/A*.md. Issues track them
under the 0.5.0 milestone.
High-leverage fixes (one bug → many files)
A0: 64-bit integer overflow wrapping for arithmetic and bitwise ops (Lua 5.3 §3.4.1; deliberate divergence from Luerl bignum semantics).Shipped in PR #177.A1: Empty/missing-key table reads returnIn review in PR #179.nil(unblocks ~6 files).A2: Long-stringIn review in PR #180.[[ … ]]lexer handles embedded]and level brackets[==[.A3: Comment tokens leak past lexer inIn review in PR #182.calls.lua.- A4: Pre-load Lua stdlibs into
package.loadedsorequire"io"resolves.
Per-file assertions (one PR each, ≤ ½ day)
- A5–A9:
bitwise,locals,nextvar,events,pm.
Investigations
- A10: Timeouts in
big.lua,closure.lua,utf8.lua.
Polish
A11: Clear in-source TODOs (Shipped in PR #194.compiler.ex:34,compiler_exception.ex:27,stdlib.ex:412).- A12: README and CHANGELOG for 1.0.0-rc.1.
- A13: Cut
1.0.0-rc.1(blocked on the rest).
Direction B — Performance (1.0.x)
Several B-direction wins landed early on (PRs #153–#156, #223). The B4–B8 sweep in May 2026 then attempted four larger architectural levers; the results are summarised here so the lessons survive the ephemeral plan files.
Shipped
- B8 — Numeric narrowing fast path (PR #227). Guard-clause short
circuits
Numeric.to_signed_int64/1for in-range integers. −3.3% on fib(30) chunk, no regressions. The realised win came entirely from the guard short-circuit;@compile {:inline, ...}does not cross module boundaries, so the cross-module call sites inExecutor/Valuestill trip a function boundary. - Bench harness rework (PR #230).
LUA_BENCH_MODE=quick(default) cuts the full suite from ~17 min to ~80 s;LUA_BENCH_MODE=fullpreserves the long windows plus a multi-nsweep ({10, 100, 1000}) for the table workloads. This harness is what surfaced B7's scale regression — the single-nmeasurement we had before would have hidden it.
Tried and deferred (with findings)
- B6 — Eliminate per-tref
Map.fetch!re-resolution. Deferred in PR #229 / #231. Post-PR #223 profile no longer supports the hypothesis:Map.getis ~3.3% on fib(22) and ~0.04% on table_build. The earlier headline number (~6.4%) was absorbed by the fast-path work in PR #223. The remaining audit cleanup is worth doing later as a refactor, not as a perf plan. - B7 — Array + hash split for
Lua.VM.Table. Implemented in PR #229, closed unmerged. Wins at smalln(-14% to -21% atn=100), loses badly at largen(+30% to +40% atn=1000). Memory regresses 3-5x atn=1000. The crossover is structural: BEAM tuples are immutable, so everysetelement/3on a 1024-cell tuple copies the whole tuple. PUC-Lua avoids this with in-place mutation in C; we cannot. A future plan could revisit with threshold-based promotion (stay in the data map untilarray_len ≥ N, then promote) — the small-nwins are real and worth preserving if the regression can be avoided. - B4 — Flat instruction stream + PC dispatch. Implemented end-to-
end on a throwaway branch (all 1705 tests + 29 lua53 suite tests
passed), closed unmerged (PR #233 records the findings). fib(30)
regressed 3%;
do_executeself-time was unchanged (50.6% vs main's 50.8%). On the BEAM,[head | rest]head-match destructures head + tail in one op whilecase :erlang.element(pc + 1, instrs) dois two ops (fetch + case discriminate); the hoped-for jump- table optimization did not produce a net win. TheLua.Compiler.Linearizedesign that the implementation used is reusable as a compile-time input to B5 without affecting the runtime executor.
What we learned
- Measure against today's profile, not the plan's old profile. B6's hypothesis was already obsolete when we got to it — PR #223 had absorbed the win. Each B-plan should re-baseline before starting.
- Multi-
nmeasurement is essential for table workloads. A singlen=500data point is right on the BEAM-tuple-copy crossover for B7-style array promotion; either side of that crossover tells a completely different story. The bench harness rework was net positive for the rest of the series — without it the B7 regression at scale would have shipped. - BEAM optimisations are subtle.
[head | rest]head-matching is heavily optimized and is hard to beat withcase-on-tuple-element.@compile {:inline, ...}does not cross module boundaries. Refactors that should help on theoretical grounds may not on the BEAM specifically; we have to measure. - Immutable data structures bound how fast we can be. B7 hit this
with
setelement/3on large tuples. The same constraint shapes what B5 can deliver — register-tuplesetelement/3is still 25% of every workload's profile and the BEAM gives us no way around that without going outside the VM (NIFs, ETS, persistent_term).
Remaining lever: B5 — Compile prototypes to Erlang functions
B5 is the architectural lever for serious throughput: translate each
%Lua.Compiler.Prototype{} to an Erlang function body and call
:compile.forms/2, letting the BEAM JIT (BEAMASM on OTP 25+)
natively optimize the hot path. Plan stretch: fib parity with Luerl
(±5%). Plan:
.agents/plans/B5-compile-prototypes-to-erlang.md.
B4's deferral does not block B5: the Lua.Compiler.Linearize
implementation from B4 can be reintroduced as a compile-time
preparation step (feeding B5's codegen flat bytecode) without
touching the runtime executor.
B5 is larger than B4 — full Erlang-AST codegen, module compile / load / purge lifecycle, fallback path for opcodes not yet translated. The plan acknowledges that landing the framework is itself a multi-month effort. Default position until a clear motivating workload appears: paused, with the implementation findings above documenting why incremental dispatch-shape work is unlikely to move the needle.
Deferred (intentional, not in 1.0)
These suite files exercise capabilities that conflict with this library's
role as a sandboxed embedded Lua VM. They are tracked as deliberate
non-goals, not "missing features we'd take a PR for". Per-file rationale
lives alongside the @deferred_permanent map in
test/lua53_suite_test.exs.
- Standalone interpreter (
main.lua) — shells out viaos.execute, writes Lua programs to temp files, invokesluaas a subprocess. We are an embedded VM with no shell-out and no standalone interpreter. - File I/O (
files.lua) —io.open,io.input,io.output,io.lines,io.read,io.write,io.tmpfile, plusos.getenv,os.remove,os.rename.io.*is a stub by design. requiresemantics that need filesystem I/O (attrib.lua) — writeslibs/A.lua,libs/B.lua, etc. to disk and dynamically loads them.- >64K constants harness via tmpfile + dofile (
verybig.lua) — the RK/ large-constants behaviour is interesting, but the harness writes a generated program withos.tmpname()/io.output()anddofiles it. A future plan could stub these for the suite runner only.
Other deferrals in this milestone:
- Coroutines (
coroutine.lua) — full continuation/process model, weeks of work. - Garbage collection / weak tables (
gc.lua). - Full debug library (
db.lua). - C-stack tests (
cstack.lua). - Backward
gotoand goto-out-of-conditional (3 skipped unit tests intest/lua/compiler/integration_test.exs).
Glossary
- Suite — the official Lua 5.3 test files in
test/lua53_tests/. - Plan — a single-PR-shaped chunk of work, lives in
.agents/plans/. - Direction — strategic grouping (A = correctness/suite, B = performance).
- Milestone — GitHub milestone tracking direction-scoped issues for a release.
Cadence
- The agent updates the Status section above on each merged PR via the
ship-a-planskill. - The human (Dave) updates the In flight / Next / Deferred sections on Mondays or whenever strategy shifts.