Lean Backend
July 2, 2026 · View on GitHub
Lean is the proof-export backend for Aver.
Use it when you want:
- Lean 4 artifacts for pure Aver code
- proof obligations for classified effectful code via Oracle lifting
- executable proof obligations from colocated
verify - universal theorems for supported
verify lawshapes, with explicit sampled/domain fallback for the rest - a path from Aver code to formal verification
This is not a second execution runtime for effectful programs.
Quick start
aver proof examples/formal/law_auto.av --verify-mode auto -o /tmp/law-auto-lean
cd /tmp/law-auto-lean && lake build
Requires a local Lean 4 toolchain (lean + lake). Aver generates the project, but does not bundle Lean itself.
What it generates
Generates a Lean 4 project:
out/
lakefile.lean
lean-toolchain
<Project>.lean
Scope
- exports pure core logic: types, pure functions, and decisions
- emits lifted pure forms for classified effectful functions used by Oracle laws
- skips unclassified effectful functions and
main - turns colocated
verify/verify lawintent into Lean proof artifacts
Oracle-lifted laws make effects explicit as theorem parameters. For example,
given rnd: Random.int = [fairDie] becomes a proof-side oracle argument with
the derived Random.int oracle signature. See oracle.md for the
classified effect set, stub signatures, Oracle law syntax, and trace assertions.
Verify emission
verify blocks become Lean proof obligations:
- default (
--verify-mode auto):example : <lhs> = <rhs> := by native_decide - fallback (
--verify-mode sorry):example : <lhs> = <rhs> := by sorry - theorem stubs (
--verify-mode theorem-skeleton): namedtheorem ... := by sorry
verify ... law ... always emits expanded sample theorems from given domains:
theorem ..._sample_n := by native_decide
The universal law theorem is always emitted as
theorem <fn>_law_<name> : ∀ ..., lhs = rhs := by .... The body is:
- a real auto-proof when the law shape matches one of the supported strategies (see
Conservative auto-proofs currently cover:below), or sorrywith an inline comment (-- verify law is sampled; universal proof must be provided manually) when no strategy matches
The per-sample and _checked_domain conjunction theorems always emit
alongside as kernel-checked native_decide evidence, so the proof
obligation is real and visible even when the universal body is sorry.
when clauses translate to extra theorem premises in Lean — both the
sampled-disjunction (x = sample₁ ∨ x = sample₂ ∨ ...) and the
when_expr = true clause itself. Generic auto-proof strategies still
run on those guarded laws, but unmatched shapes fall back to the
sorry universal theorem plus per-sample theorems described above.
Parser/render roundtrip laws such as parse(render(x)) = x currently
live in that fallback bucket unless some other generic shape
discharges them first.
When law <ident> names an existing pure function and the law body compares foo(args) against fooSpec(args), Aver treats that as a canonical spec reference:
- the generated theorem/comment uses the canonical
<fn>_eq_<spec>naming aver contextalso recordsfibSpecas a spec forfib- in
--verify-mode auto, the universal theorem body is auto-proven when the law shape matches a strategy; otherwise it lands withsorryand a comment for the user to fill in
Example:
fn fibSpec(n: Int) -> Int
match n
0 -> 0
1 -> 1
_ -> fibSpec(n - 1) + fibSpec(n - 2)
verify fib law fibSpec
given n: Int = [0, 1, 2, 3, 4, 5]
fib(n) => fibSpec(n)
Specs over invariants
This is the intended proof style in Aver:
- the author writes a simple pure spec function
- the author writes
verify impl law implSpec - the proof backend tries to connect implementation and spec
The goal is to avoid making the surface language proof-engineer-first.
Invariants still exist as a proof concept, especially for optimized implementations such as tail-recursive helpers, parsers with state, or accumulator-heavy code. But Aver tries to push those invariants down into the proof backend whenever possible, instead of making users write them first.
In short:
- user-facing Aver should prefer explicit specs
- the proof backend should absorb invariants where it can
- dropping to explicit invariant reasoning should be the exception, not the default workflow
If Aver cannot auto-prove the universal law shape in --verify-mode auto, the universal theorem body lands as sorry with an explanatory inline comment, and the per-sample + _checked_domain theorems still emit alongside as kernel-checked evidence. The proof obligation stays visible (and Lean will reject lake build until the user replaces the sorry), but the file compiles, so the case-level evidence remains useful even before someone closes the universal.
See How the auto-prover decides for the recognizer families currently covered.
The generated Lean prelude also includes one-character separator lemmas such as
AverString.split over String.join(_, sep) ++ sep for separator-free parts.
Those helper lemmas support exported code, but they are not themselves a claim
that delimiter-based parser/render laws are universally auto-proved.
How the auto-prover decides
There is no proof search and no AI at proof time. Each verify law runs through a fixed, deterministic decision tree of shape recognizers; the first recognizer that matches pins a strategy, and each strategy emits a fixed tactic script. The Lean kernel — not the recognizer — is the judge: a law is credited universal: true only when its theorem's #print axioms stays inside {propext, Classical.choice, Quot.sound} (in particular native_decide proofs never count — they trust the compiler's evaluator via Lean.ofReduceBool). A strategy that does not close degrades to a caught sorry, never a false proof.
The recognizer families, roughly in routing order:
- syntactic algebra — reflexive, commutative / associative / identity / anti-commutative wrapper shapes over
Intoperators. - spec equivalence — implementation-vs-spec laws closed structurally, by conservative
simpcleanup, or byomegaover linearIntarithmetic (including second-order recurrences with a pair-state tail-recursive worker, andInt.max/Int.minshapes). - induction — structural induction on list/ADT givens, with generalizing variants (
induction xs generalizing n) for accumulator-threaded and both-arguments-peeling shapes, and arm-level injection of bridge and sibling lemmas. - Map laws — shipped
Maplemmas (self-key and general-key) plus a map-fold homomorphism recognizer. - ground enumeration — laws over fixed enum/ADT constructor arguments, and laws whose every given ranges over a finite domain (
Bool, fieldless enums): exhaustivecasesplusrfl/decide, which computes straight through fuel wrappers on closed values. - ring identities over records — unconditional algebra laws of records with all-
Intfields (for example exact rationals compared by cross-multiplication,examples/data/rational.av): after unfolding the call cone, both sides distribute and AC-normalize to the same polynomial via a fixed package of coreIntring lemmas — commutativity, associativity, distributivity, negation/subtraction laws all close kernel-genuine, no Mathlib. - builtin facts — laws whose call cone bottoms out in builtins close by
simpover the cone plus prelude spec lemmas the compiler ships (for exampleInt.fromString_fromIntandString.slicefacts). - floor-division windows — laws over a power-of-two function, a floor-halving binary-exponent search (
Result.withDefault(Int.div(a, 2), 0)recursion), and the power-of-two window predicates built from them: positivity and the sum homomorphism of the power function, the scaled-significand window (2^(n-1) <= sig(a, b, n) < 2^nunderwhen b >= 1; a >= b; n >= 1), and the bit-width product window. Thesewhen-laws are stated and proven in true universal form: the emitted proof combines functional induction over the well-founded definitions with the core floor-division bridges (Int.le_ediv_iff_mul_le,Int.ediv_lt_iff_lt_mul) — kernel-genuine, no Mathlib. Seetests/fixtures/floor_window.av. These windows reproduce the integer-arithmetic core of a classic hardware floating-point-divider correctness proof; reproducing a known-hard result is deliberate, because it calibrates how much of such a proof the engine carries on its own. The frame outlives any single theorem — the next targets on deck are the self-hosted interpreter's own evaluator core and an exact-decimal arithmetic library for regulated computation. - synthesized lemmas about your functions — when a function matches a conservative shape gate (for example the canonical string-position scanner), the compiler synthesizes and kernel-proves a companion lemma about it and uses it in higher strategies such as the decimal render/parse roundtrip. When the gate does not match, nothing is emitted.
- fallback — bounded evidence only: per-sample and
_checked_domaintheorems vianative_decide, plus the universal theorem with an honestsorry.when-laws get a guarded-domain enumeration instead; their bounded statements carry an explicit statement class and are never credited as universal.
When the recipes run out, the escalation path is more Aver, not Lean: split the hard law into helper laws — each one is a runnable test in milliseconds — and once proven, earlier laws in the same file become rewrite ammunition for the laws below them. A proven helper law committed in scope is picked up automatically by the law below it (the SimpOverLemmas feedback loop); the the-method agent loop can propose those helper laws for you.
Termination is part of the same honesty story: structural recursion over your own types and recognized well-founded shapes (for example quicksort's mutual recursion) emit genuine total definitions; the remaining recursive shapes are emitted fuel-wrapped, with the fuel budget derived from a synthesized size measure of the call-site arguments.
What "kernel-genuine" does and does not cover
Kernel-genuine is a precise, narrow claim: the Lean kernel checked the proof of the theorem as translated. It certifies the tactics, not the translation. The Aver→Lean statement translator — the code that turns your verify law into a Lean proposition — is part of the trusted base. If the translator renders Int.div with the wrong rounding, or mistranslates a when guard, the kernel will still happily certify a true theorem about the wrong statement, and verify will still be green.
We do not hide this, and we are not building a verified translator (that is CompCert-scale work for a much smaller risk). Instead we lean on a validation the corpus already provides: every verify example dual-runs on the VM and, as an exported Lean example, through the translation. Each such example is one point where the VM's semantics and the translated semantics are checked to agree — a pointwise translation-validation test that runs for free on every example in the suite. The negative-divisor rounding question above, for instance, is pinned by any example whose sample exercises a negative dividend.
The honest gap is coverage: nobody has yet measured which language constructs and edge cases the dual-run corpus actually exercises. Making that coverage measurable — a construct × edge-case matrix against the existing examples — is in progress. Until then, treat kernel-genuine as "the tactics are sound and the statement matches the runtime on every point we have tested," not "the statement is provably the one you wrote."
Proof mode
Recommended mode:
aver proof my_module.av --verify-mode auto -o out/
That combination means:
- regular
verifycases become executable Lean checks vianative_decide - supported
verify lawshapes get real universal proofs - unsupported
verify lawshapes emit the universal theorem with asorrybody and an inline comment, plus the per-sample +_checked_domaintheorems as kernel-checked evidence - recursive pure code inside the supported proof subset is emitted as total Lean defs
- unsupported recursive pure functions are called out explicitly and emitted with
partialfallback
When a law lands on the sorry fallback and you want to know why, see transpilation.md → Debugging a law that didn't auto-prove for the --emit-ir-after=law_lower workflow.
The current proof export supports:
- single-function
Intcountdown on anIntparameter (n -> n - 1). Closed-world fns (noexposesclause, or absent from the list) with the canonicalmatch p { L -> base; _ -> rec(p-1, ...) }body emit as a native aux def carrying a precondition extracted from the unique external caller's surroundingmatch/ifguards —(h_dom : n ≥ 0)fromfib'smatch (n < 0) { false -> ... }arm, or compound predicates like(h_dom : n > 2 ∧ n < 500)from nested caller guards. The aux is wrapped by a thin public def preserving the original signature.Lean omegacloses the per-callsite preservation obligation and theInt.natAbs ndecrease automatically. Other countdown shapes (exposed fns, multi-caller fns, leading let-bindings) fall back to fuel-encoded helpers. - single-function
Intfloor-division countdown by a literal divisor — every self-call shrinks anIntparameter throughResult.withDefault(Int.div(p, k), d)with literalk >= 2(inlined, or through a unary wrapper likefn half(a) = Result.withDefault(Int.div(a, 2), 0)), and the guards enclosing every self-call provably implyp >= 1(e.g.match p > 0, or the binary-exponent pairb >= 1anda >= 2 * b). Emits a genuine well-founded def (termination_by p.toNat, kernel-checked decrease) instead of a kernel-opaquepartial def; an unvalidated guard declines, never guesses. - single-function second-order affine
Intrecurrences withn < 0guard,0/1case split, and a matching pair-state tail worker, emitted via a privateNathelper - single-function structural recursion on any
List<_>parameter - single-function
String + posrecursion on(String, Int)signatures - mutual recursion SCC with first-parameter
Intcountdown - mutual recursion SCC with ranked
String + posprogress - mutual recursion SCC with ranked structural descent over recursive parameters (emitted as native
mutual ... termination_by ... endblock when every SCC member has aList/VectorsizeOf measure; fuel-encoded otherwise)
Checking an export (--check / --check-json)
aver proof file.av --backend lean -o out/ --check builds the export with
lake and gates on the result; --check-json prints one machine-readable
summary line. Fields of the Lean summary:
passed— the build succeeded within the budgets (a bounded verify-on-domain still passes; this is deliberately the lenient gate)sorries— residualsorrycount across the build output (--sorry-budget Ntolerates up to N)build_errors— count of HARD lake/lean build errors (source-locatederror: file.lean:L:C: …diagnostics), distinct fromsorries. A degraded proof arm should always fall to a caughtsorry; a non-zerobuild_errorsmeans a tactic escaped thefirst | … | sorryfloor andsorriesalone would read as an honest-looking result. Informational only — it does not changepassedor the exit codeuniversal—trueonly when EVERY law theorem in the export is kernel-genuine: its#print axiomsstays inside{propext, Classical.choice, Quot.sound}(sonative_decideandsorrynever count), at least one theorem is explicitly classed universal, and the file has no sorriesuniversal_laws— how many law theorems classed universal passed that same per-theorem axiom whitelist. On a file with sorries the axiom audit does not run at all, so this reports0— pin it together withsorries == 0, not instead of itbounded_laws— how many law theorems the emitter classed bounded-domain (stated only over the finite sample grid, e.g. guardedwhen-law enumerations); these never earn universal credit
Note: a law the emitter declines entirely (no theorem emitted — e.g. a
shape outside every strategy) appears in NEITHER counter; the counters
count emitted law THEOREMS, not verify ... law blocks in the source.
model_panicked— the compiler-model panicked while evaluating a bounded sample; the check fails regardless of budgetsbudget— the active sorry budget
universal_laws and bounded_laws are sourced from the same per-theorem
statement-class markers and the same #print axioms audit the universal
bool keys on. A robust CI budget pins all four together:
sorries == X, universal == true, universal_laws == N,
bounded_laws == M.
Minimizing a proof (--minimize)
aver proof file.av --backend lean -o out/ --check --minimize (Lean-only,
implies --check) rewrites each auto-proof to the tactic that actually closed
it. The auto-prover pins a deterministic first | (t₁) | (t₂) | … | sorry
PORTFOLIO at every law — it cannot know statically which alternative a given
goal needs — and --minimize resolves that hedge against a real build:
- each
firstbranch is prefixed with atrace "AVERMIN:i:b"marker and the project is built ONCE.firsttries branches left-to-right and commits to the first that closes, tracing each it reaches, so the winning branch of portfolioiis the highest indexbthat appears in the build log (failed branches before it trace too — the markers are not rolled back); - each portfolio is collapsed to its winning branch and the project is RE-VERIFIED. If a theorem no longer closes (a mis-parsed winner), it keeps its original portfolio.
So minimization is fail-safe (it can never ship a proof that does not
build) and status-preserving (it keeps exactly the branch Lean committed
to). A law that closes for real drops its alternation and sorry floor and
reads like a hand-written proof; a law that only closed via its floor collapses
to a bare sorry — the honest gap is kept, never silently dropped, so the
sorries / universal numbers from --check are unchanged. It is an opt-in
polishing pass (it costs two extra lake builds), not part of the normal
verify loop.
Refinement records (refinement-via-opaque)
An Aver record X { v: Int } paired with a validating smart constructor
fn fromX(n: Int) -> Result<X, String> whose body is the canonical
match <pred(n)> { true -> Result.Ok(X(v = n)); false -> Result.Err(_) }
shape lifts to a true refinement subtype in Lean:
abbrev Natural := { v : Int // v ≥ 0 }
The predicate from the smart constructor's bool guard rides in the
type itself, so a verify add law commutative over Natural quantifies
directly over the refined type:
theorem add_law_commutative : ∀ (a b : Natural), add a b = add b a := by
intro a b
unfold add fromInt
simp [Int.add_comm, Int.mul_comm]
— one line, instead of the pre-0.22 by_cases h_a : a ≥ 0 / by_cases h_b : b ≥ 0 / unfold / hand-rolled tactic plumbing per law shape. The
lift is automatic, no source-language change. Triggers for single-field
Int carriers; Float and String carriers keep the plain
structure path (IEEE 754 NaN breaks universal float laws, strings have
no universal algebraic structure to exploit).
Refinement records work the same way whether the type is declared in
the entry file or in a dependent module — aver proof natural.av and
aver proof natural_app.av --module-root examples both emit the same
lifted Subtype shape. Pre-0.22 cross-module fell back to the wrapper
shape and lost the one-line universal proof.
verify ... law blocks with a when clause keep the clause as a
theorem premise when it carries information beyond the refinement
type's invariant. A when a >= 10 over Natural (invariant
a.val >= 0) shows up as a real (a.val ≥ 10) -> ... antecedent
on the universal; a redundant when a >= 0 is dropped cleanly so the
universal stays in the one-line ∀ (a : Natural), ... shape.
Current end-to-end examples
These examples are currently smoke-tested end to end with
aver proof --verify-mode auto plus lake build:
examples/formal/law_auto.avexamples/data/fibonacci.avexamples/data/quicksort.avexamples/data/rle.avexamples/data/json.avexamples/core/grok_s_language.avexamples/refinement/natural/natural.av— refinement-via-opaque (Int +>= 0)examples/refinement/positive/positive.av— refinement (Int +>= 1)examples/refinement/int_range/int_range.av— refinement with compoundBool.and(n >= 0, n <= 100)examples/refinement/bigint/bigint.av— refinement overList<Int>+ mutual-rec digit arithmeticexamples/refinement/nonneg_float/nonneg_float.av— Float carrier, structure pathexamples/refinement/email/email.av— String carrier, structure path
Other modules have unit coverage for proof-subset classification and generated Lean snippets, but are not currently listed here as end-to-end smoke cases.
Hard-fail guarantees
Lean codegen does not silently mask unresolved compiler internals:
Expr::Resolvedin codegen input is a hard codegen errorType::Invalidin codegen input is a hard codegen errorsorryis emitted in two situations: explicit--verify-mode sorry, and--verify-mode autouniversal-law theorems whose shape no auto-proof strategy covers (always paired with an inline comment + kernel-checked per-sample theorems alongside, so the obligation stays visible)