Changelog

May 8, 2026 · View on GitHub

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Unreleased

[0.17.13] - 2026-05-08

Fixed (DXIL)

  • PHI node ordering — mem2reg Phase B phi instructions now grouped at top of basic blocks before non-phi instructions. Fixes path_count.wgsl (Vello tilecompute) IDxcValidator "PHI nodes not grouped at top of basic block" error. Root cause: merged StmtEmit runs emitted low-handle regular expressions before high-handle phi expressions.

Added

  • Test coverage waves 3-4: hlsl 70.6%, wgsl/lower 65.3%, msl 64.2%, spirv 76.5%, dxil/emit 37.4%. Overall ~60%.

[0.17.12] - 2026-05-07

Changed

  • ARCH-001: Internal packages refactoring — all backends restructured following DXIL pattern. Implementation moved to internal/codegen/, public API uses real types (not aliases).
    • glsl/internal/codegen/ — 9 files, thin public wrapper
    • msl/internal/codegen/ — 10 files, thin public wrapper
    • hlsl/internal/codegen/ — 15 files, shared namer in internal/backend/
    • spirv/internal/codegen/ — 5 files, thin public wrapper
    • wgsl/internal/parser/ + wgsl/internal/lower/ — parser/lowerer split
    • internal/registry/ — TypeRegistry extracted from ir/
    • internal/textutil/ — shared IndentWriter (DRY across glsl/hlsl/msl)
    • Public API surface reduced from 398 to ~120 symbols

Fixed

  • 13 panics converted to error returns across all backends. GLSL/HLSL exitLoop/exitSwitch stack corruption, SPIR-V emitScalarType/addressSpaceToStorageClass/findTypeHandleByID and 6 more. ~200 caller sites updated.

  • Unsupported stage validation — GLSL and MSL now reject mesh/task shader stages with clear error messages instead of generating invalid code.

Added

  • Test coverage: 12/18 packages ≥80%. Enterprise-quality tests with hand-crafted IR modules, output verification, and regression protection.
    • 100%: internal/textutil, dxil/internal/module
    • 96.6%: internal/backend
    • 92.9%: dxil/passes/mem2reg
    • 88.6%: dxil/passes/dce
    • 83.8%: wgsl/parser, dxil/bitcode
    • 83.0%: dxil/viewid, dxil/passes/sroa
    • 81.1%: ir
    • 80.6%: glsl/codegen, 80.8%: dxil/container

[0.17.11] - 2026-05-06

Fixed (DXIL)

  • Array-of-vector flattening (BUG-DXIL-041). array<vec4<f32>, N> locals caused "Invalid record" from IDxcValidator. Three coordinated bugs: (1) GEP index not scaled by vector width, (2) single-scalar load instead of multi-scalar vector load, (3) garbage component IDs from missing pendingComponents. Fixed via tryLoadVectorFromFlatArray() and scaleIndexForVecArray(). Reported by wgpu agent via fine.wgsl (tilecompute blend stack).

Metrics

  • gg production: 58/59 → 61/61 (0 failures) 🏆
  • DXC golden diff=0: 105 (unchanged)
  • Line parity: 55.1% (unchanged)
  • IDxcValidator: 161/170 (unchanged)

[0.17.10] - 2026-04-30

Added (DXIL)

  • Workgroup struct decomposition — struct-typed groupshared variables decomposed into per-member globals with MSVC-mangled suffix names.
  • ViewID ExprAlias/ExprPhi — precise post-mem2reg dataflow tracking instead of conservative all-inputs→all-outputs fallback.
  • Instruction scheduling infrastructure — eval-right-first for resource reads, exprLeadsToResourceRead(), leafEmitPriority().
  • StmtEmit range merging — consecutive emit ranges merged for cross-range reassociation.
  • Constant folding — int-to-float casts (sitofp/uitofp), bitcast float→i32 for raw buffer stores, binary op folding infrastructure.
  • Alignment encoding fixlog2(bytes)+1 for store/load/alloca/globalvar.
  • Mul-to-shl in CBV/UAV — byte-offset stride via addMulOrShlInstr().
  • TBAA normalizer — strip LLVM optimization hint metadata.

Metrics

  • DXC golden diff=0: 104 → 105 (+1)
  • Line parity: 54.5% → 55.1% (+0.6pp)

[0.17.9] - 2026-04-30

Added (DXIL)

  • Per-member loadInput DCE — backwards reachability analysis eliminates dx.op.loadInput calls for unused struct input members after inlining.
  • Zero-store local promotion — unassigned struct members resolve to zero constants, eliminating alloca/load chains.
  • Same-type integer cast eliminationbitcast i32↔i32 is no-op.
  • Sub→add canonicalizationsub X, Cadd X, -C for positive constants.
  • Mul→shl strength reductionmul X, 2^Nshl X, N.
  • QuantizeF16 via legacy opsdx.op.legacyF32ToF16/F16ToF32 instead of fptrunc/fpext. Eliminates NativeLowPrecision flag cascade (13 shaders).
  • Int64 flag from emitted bitcode — scans LLVM module, not IR type arena.
  • AtomicInt64OnHeapResource flag — 64-bit atomics on non-workgroup resources.
  • MSVC groupshared name decoration\01?name@@3typeA format for workgroup vars.
  • createHandle legacy path — always use opcode 57, removed unused createHandleFromBinding path.

Fixed (DXIL)

  • Input sigId — uses element index, not register row. Fixes packed inputs.
  • ViewID StartCol — packed linear indexing includes column offset.
  • Cross-argument struct input ordering — reverse signature order globally.

Metrics

  • DXC golden diff=0: 94 → 104 (+10)
  • Line parity: 48.1% → 54.5% (+6.4pp)

[0.17.8] - 2026-04-30

Fixed

  • Function call argument type validation (#66). WGSL lowerer now validates argument count and types against function parameters. Passing vec2<u32> where u32 is expected now produces a clear compile error instead of silently generating invalid shader code that crashes at pipeline creation. Reported by @maxsupermanhd.

  • Mandatory semicolons — 16 places in the parser changed from optional to required semicolons per WGSL grammar. const X: u32 = 42 without ; now errors. For-loop init/update handled via inForHeader context flag.

  • @must_use enforcement — functions with @must_use attribute now reject calls where the result is discarded as a statement.

  • @compute requires @workgroup_size — compute entry points without @workgroup_size attribute now error at compile time.

  • const_assert evaluationconst_assert false; now produces a compile error. Supports bool literals, negation, logical ops, integer comparisons, and named constants. Complex expressions gracefully skipped.

  • @binding/@group pairing — resource variables with @binding but no @group (or vice versa) now error.

  • Zero-sized arrays rejectedarray<T, 0> now produces a compile error per WGSL spec (array size must be positive).

  • Invalid swizzle components — GLSL-only s/t/p/q swizzle names rejected. Mixed namespaces (v.xg, v.rb) rejected. Only x/y/z/w and r/g/b/a accepted, each set must be used consistently.

Validation parity

All 8 fixes verified against Rust naga — identical rejection behavior.

[0.17.6] - 2026-04-23

Fixed (DXIL)

  • Single-store local promotion — eliminates alloca/store/load chains for vertex/fragment output staging, matching DXC's direct storeOutput.
  • Sampler heap after input loads — DXC emit ordering for fragment shaders.
  • HLSL namer suffix — trailing _ on resource names ending with digits or matching HLSL keywords.
  • Input used mask sorted order — correct extended properties metadata for fragment shaders with multiple struct inputs.
  • Fragment input signature ordering — LOC semantics before SV_Position.
  • Raw buffer i32 overload — float loads via i32 + bitcast, matching DXC ByteAddressBuffer convention.
  • Strength reductionurem x, 2^Nand x, (2^N-1) at emit time.

Metrics

  • DXC golden diff=0: 82 → 94 (+12)
  • Line parity: 46.5% → 48.1%

[0.17.5] - 2026-04-23

Added

  • ir.TypeSize() — shared type size calculation matching Rust naga TypeInner::try_size(gctx). Used by wgpu core for late buffer binding size validation (VAL-006). 23 unit tests.

  • ir.StorageFormat.IsUnorm() / IsSnorm() — predicate methods for storage texture format classification, used by DXIL backend for correct component type metadata.

Fixed (DXIL)

  • UNorm/SNorm component typesrgba8unormUNormF32 (14) in metadata.
  • CBV metadata size — actual struct size, not vec4-rounded.
  • Named metadata ordering!dx.resources before !dx.viewIdState.
  • dx.op attribute classification — correct nounwind readonly attributes.

Changed

  • DXC golden normalizer — function declarations and attribute definitions sorted alphabetically, eliminating false-positive ordering diffs.

Metrics

  • DXC golden diff=0: 72 → 82 (+10)
  • Line parity: 45.7% → 46.5%

Notes

  • DXIL validation gate terminology. Prior releases reported "N/N DXC validation" pass counts. The underlying command is dxc.exe -dumpbin, which is a parser-printer (structural parse + human-readable dump), not a full DXIL validator. It catches malformed bitcode (e.g. the "Invalid record" class of bugs fixed in BUG-DXIL-004) but does NOT cross-check ABI-level metadata against D3D12 runtime expectations (e.g. PSV0 ShaderStage byte against the pipeline slot). Genuine validation via IDxcValidator::Validate() — including a three-layer defensive wrapper that prevents dxil.dll AV on any input — landed in 0.17.4 (see cmd/dxilval, internal/dxcvalidator, and BUG-DXIL-VALIDATOR-REAL entries below). CHANGELOG/README/ROADMAP wording in past entries has been left as-is for historical accuracy; new entries distinguish "DXC parse" from "IDxcValidator real validation".

[0.17.4] - 2026-04-21

Added

  • cmd/dxilval — Pure Go DXIL validation CLI backed by IDxcValidator (FEAT-DXIL-010 + BUG-DXIL-VALIDATOR-REAL, v0.17.4). First-ever Pure Go integration with Microsoft's IDxcValidator (dxil.dll), zero CGO. Three modes: dxilval shader.dxil validates a single container, dxilval --wgsl shader.wgsl compiles through naga and validates each entry point, dxilval --corpus dir/ walks a directory and reports a typed-error summary. Internal internal/dxcvalidator package wraps IDxcValidator via syscall + a custom IDxcBlob implemented through syscall.NewCallback, spawns a fresh OS thread via kernel32!CreateThread (mandatory — dxil.dll's thread-local allocator is set up in DLL_THREAD_ATTACH, which Windows only fires for threads created AFTER LoadLibrary), and falls back to Windows 10 SDK paths when dxil.dll is not on PATH. The empirical "first-ever VALID (S_OK)" check on the golden tmp/min1_final.dxil fixture is now a permanent unit test (TestSmokeValidateGoldenFixture) instead of a one-off PoC.

  • internal/dxcvalidator — three-layer defensive validation stack (BUG-DXIL-VALIDATOR-REAL, v0.17.4). Any blob handed to dxil.dll passes through a staged defence that prevents the validator-AV classes documented in Phase 0 research:

    • Layer 0 — emitter-side assertion. dxil/internal/emit/emitter.go now refuses to emit a container when the entry function is unset. Without this guard, the BUG-DXIL-012 regression class would write !dx.entryPoints[0][0] = null, which causes IDxcValidator to AV at dxil.dll+0xe9da (NULL+0x18) in its entry-point walker. Any future regression becomes an attributable Go error instead of a silent process crash.
    • Layer 1 — PreCheckContainer structural check (FEAT-VALIDATOR-PRECHECK-001). internal/dxcvalidator/precheck.go walks the DXBC container at fixed offsets and rejects truncated blobs, bad magic, bad part counts, malformed part headers, missing DXIL / ILDB / PSV0 / ISG1 / OSG1 parts, invalid PSV0 stage bytes, and empty entry-function-name strings. Ten typed sentinel errors (errors.Is-switchable), every branch exercised by unit tests, runs before the HeapAlloc copy so rejection costs nothing.
    • Layer 2 — bitcheck.Check bitcode metadata walker (FEAT-VALIDATOR-BITCHECK-001). internal/dxcvalidator/bitcheck/ is a minimal Pure Go LLVM 3.7 bitstream reader mirroring dxil/internal/bitcode/writer.go one-to-one. It walks just far enough to find the !dx.entryPoints named metadata and verify each entry-point tuple has a non-null function reference in operand 0. Skips non-metadata blocks via block-length fast-forward. Scoped specifically to the BUG-DXIL-012 AV class — not a general-purpose LLVM bitcode parser. ~2500 LOC including tests, five typed sentinel errors, 72.3% line coverage, DXC abbreviation-decoding implemented and exercised via hand-assembled fixtures. A minimal real DXC integration fixture is deferred to FEAT-VALIDATOR-BITCHECK-002.

    The wrapper is also defensive against sporadic dxil.dll misbehaviour observed during corpus walking — the validator occasionally returns S_OK with a NULL IDxcOperationResult on some inputs (debug-symbol-large-source.wgsl), which the wrapper now catches and surfaces as a clean typed error instead of dereferencing NULL through the COM vtable.

  • DXIL: PSV0 signature element generalization for all binding kinds (BUG-DXIL-019 follow-up, v0.17.4). buildGraphicsPSVSigs / makePSVSignatureElement now cover every graphics-stage I/O binding: LocationBinding (inputs and non-fragment outputs → arbitrary TEXCOORD with real location-based index; fragment color outputs → SV_Target with location as semantic index) plus the full system- value set (BuiltinFrontFacing, BuiltinSampleIndex, BuiltinSampleMask, BuiltinClipDistance, BuiltinPrimitiveIndex, BuiltinViewIndex). Interpolation mode mapping covers all DXIL InterpolationMode enum values (Constant / Linear / LinearNoperspective / Centroid / Sample variants). Per-side (input/output) start-row tracking replaces the previous always-zero StartRow. Refactor into psvSemanticForBinding / psvSemanticForBuiltin / psvSemanticForLocation / psvInterpolationMode / psvComponentType helpers for readability.

  • DXIL: Options.BindingMap — public API for remapping WGSL @group/@binding to DXIL (space, register), mirroring hlsl.Options.BindingMap. Required for wgpu root signatures, which use monotonic per-class counters (SRV=t0,t1,… / UAV=u0,u1,… / CBV=b0,b1,…). Without a map, behavior is unchanged (raw WGSL numbers used as absolute DXIL registers — backward compatible). New public types: dxil.BindingLocation, dxil.BindTarget, dxil.BindingMap.

Fixed

  • DXIL: read-only storage buffers classified as SRVvar<storage, read> now lowers to SRV (t-register, ByteAddressBuffer), while var<storage, read_write> stays UAV (u-register, RWByteAddressBuffer), matching the HLSL backend. Previously all SpaceStorage globals became UAV, causing register collisions in pipelines that mix read-only and read-write storage buffers (e.g. particle sim pin/pout). Also fixes latent bug in resourceKind where SRV storage buffers fell through to Texture2D metadata kind.

  • DXIL: SRV storage compute path — binding arrays and struct vector loads (BUG-DXIL-004). Two fixes needed after the SRV classification change to unblock real compute workloads:

    • resolveBindingArrayUAVChainFromGV was still gated on class == UAV, so binding_array<T> over var<storage, read> fell through to a generic scalar load path (binding-buffer-arrays regressed 163/163 → 162/163). Relaxed to isStorageBufferClass, matching the other nine resolveUAV* helpers.
    • Pre-existing "Invalid record" in @compute + vector-field-of-local-struct loads, exposed by particles sim: loading p.vel where p is a local Particle{pos: vec2, vel: vec2} fell through to generic emitLoad, which emitted a single scalar load float and never set pendingComponents. Downstream emitBinaryVectorized/getComponentID grabbed the adjacent cbufferLoadLegacy struct-typed result as an f32 operand, corrupting the bitcode. Fixed by decomposing vector struct-field loads into N per-component GEP + scalar load with proper pendingComponents tracking.
    • Regression corpus: compute-storage-read-rw.wgsl (canonical SRV storage compute) and compute-storage-struct-read-rw.wgsl (minimal particles reproducer). TestDxilValSummary: 165/165 (100%), up from 163/163.
  • DXIL: DCE pass (dead code elimination) — mark-and-sweep pass removes dead locals, dead control flow, dead pure function calls, and dead resources from the IR before DXIL emission. Matches the DxilLinker.cpp optimization pipeline used by DXC.

  • DXIL: SROA pass (scalar replacement of aggregates) — struct-typed local variables are decomposed into per-member locals, enabling mem2reg promotion of individual fields. Init-only locals emit their initializer directly and skip alloca entirely.

  • DXIL: mem2reg Phase B — if/switch phi insertion with SSA construction for promoted locals. Extends the existing Phase A (straight-line promotion) with control-flow-aware phi placement.

  • DXIL: inline pass improvements — alias aggregate/opaque args (struct, texture, sampler) instead of copying; two-tier inline policy (pure simple helpers inlined automatically); early-return wrapping via loop+break pattern for helpers with multiple returns; arg-spill StmtEmit coverage for mem2reg promotion of spilled arguments.

  • DXIL: PrefixStable register packing — register allocation now matches the DXC algorithm: sort by class priority (UAV > SRV > CBV > Sampler), then by declaration order within each class, producing identical ISG1/OSG1/PSG1 row assignments.

  • DXIL: createHandle class priority ordering — handle creation order now follows DXC convention (UAV > SRV > CBV > Sampler), fixing validation mismatches in shaders with mixed resource types.

  • DXIL: barrier mode flags + noduplicate — barrier intrinsics now emit proper DXIL mode flags and carry the noduplicate function attribute, matching DXC output exactly.

  • DXIL: fast-math flags + operand canonicalization — arithmetic instructions carry fast-math flags matching DXC defaults; commutative binary ops follow LLVM InstCombine canonicalization (constants to RHS); Reassociate pass for commutative chains.

  • DXIL: loadInput reverse ISG1 row order — input loads now emit in reverse ISG1 row order matching DXC, with component-level DCE eliminating unused loads.

  • DXIL: raw buffer i32 overload + bitcast — float stores to raw buffers use the i32 bufferStore overload with bitcast, matching DXC behavior instead of emitting a non-existent float overload.

  • DXIL: post-DCE shader model version auto-upgrade — after dead code elimination, the emitter re-scans used intrinsics and upgrades the SM version if remaining code requires a higher minimum (e.g. SM 6.0 → 6.2 for wave ops).

  • DXIL: IR-based input signature Used mask — ISG1 Used column now computed from IR reachability analysis rather than always marking all inputs as used.

  • DXIL: CBV typed struct members — uniform buffer types now emit LLVM struct members with proper types ([4 x <4 x float>] for mat4x4) instead of raw byte arrays, matching DXC's hostlayout.struct convention.

  • DXIL: HLSL-style type naming — DXIL metadata and type names now use HLSL-compatible naming (class.Texture2D, hostlayout.struct.Uniforms), matching DXC for golden diff parity.

  • DXIL: sampler heap SRV ordering + configurable bindings — sampler heap entries follow SRV class ordering; Options now supports configurable sampler bindings for custom root signature layouts.

  • DXIL: per-component ViewID taint propagation — output signature ViewID dependencies are tracked per-component through the dataflow graph, matching DXC's PSV0 ViewID taint analysis.

  • DXIL: dead resource elimination — unreachable global resources (detected via reachable globals analysis after inlining) are excluded from handle creation and metadata tables.

  • DXIL: input signature extended-properties null mask — ISG1 extended properties table uses null entries for inputs that carry no extra metadata, matching DXC's sparse encoding.

  • DXIL: TestDxilDxcGolden — new SPIR-V-style golden parity test comparing naga DXIL output against DXC reference across 182 shaders. Reports line-level parity percentage and exact-match (diff=0) counts.

  • DXIL: TestDxilValGGProduction — regression guard validating all 57 gg production entry points through IDxcValidator on every test run.

Metrics (DXIL hardening batch)

  • IDxcValidator: 152/170 → 161/170 (94.7%, +9 shaders)
  • DXC golden diff=0: 24 → 72 (+48 shaders exact match)
  • DXC golden line parity: 37.0% → 45.7% (+8.7pp)
  • gg production: 50/57 → 57/57 (100%, all entry points VALID)
  • Visual: black screen → circles + text rendering on D3D12 DXIL pipeline
  • Text backends: 100% unchanged (MSL/HLSL/GLSL/SPIR-V untouched)

[0.17.3] - 2026-04-11

Added

  • DXIL: CBV (Constant Buffer) loadsdx.op.cbufferLoadLegacy for var<uniform>. Register index calculation (byteOffset/16), component extraction via extractvalue. Supports f32/i32/f64/i64/f16 overloads, struct member access at arbitrary offsets.

  • DXIL: Compute shader support (Phase 2)@compute entry points now compile to DXIL:

    • Thread ID builtins: dx.op.threadId, dx.op.groupId, dx.op.threadIdInGroup, dx.op.flattenedThreadIdInGroup
    • numthreads metadata from @workgroup_size(X,Y,Z)
    • UAV storage buffers: dx.op.bufferLoad/dx.op.bufferStore for var<storage, read_write>
    • Atomic operations: dx.op.atomicBinOp (add, subtract, and, or, xor, min, max, exchange), dx.op.atomicCompareExchange, atomic load/store
    • Barriers: dx.op.barrier with storage/workgroup/subgroup flag mapping
    • Reference: Mesa nir_to_dxil.c

Fixed

  • DXIL: bitcode binary operation opcodesBinOpKind constants used LLVM IR enum numbering (FAdd=1) instead of bitcode unified opcodes (Add/FAdd=0). DXC decoded our FAdd as FSub. Fixed to match Mesa dxil_module.h encoding.

  • DXIL: finalize() operand remappingfinalize() remapped ALL instruction operands as value IDs, corrupting type IDs, opcodes, alignment values, and basic block indices. Added valueOperandIndices() that precisely identifies which operands are values per instruction type.

  • DXIL: alloca alignment encoding — Was log2(bytes), should be log2(bytes)+1 per LLVM 3.7 / Mesa dxil_emit_alloca().

  • DXIL: vector local variable scalarization — Single alloca for vec4<f32> replaced with per-component allocas. Store/Load now operate on correct components.

  • DXIL: GEP struct access — Added getelementptr (FUNC_CODE_INST_GEP=43) for struct member access from local variable pointers. Nested struct access with flat offset computation. Struct store decomposed into per-scalar-field GEP + store.

  • DXIL: retail hash wired upComputeRetailHash() (INF-0004 modified MD5) was implemented but never called. Now used when UseBypassHash=false.

  • DXIL: push constants as CBVSpacePushConstant and SpaceImmediate globals classified as CBV resources with synthetic bindings.

  • DXIL: resource metadata rewrite — Per-class metadata matching Mesa exactly. CBV fields[6] = buffer size (was resource kind), SRV/UAV 9-11 fields with element type tags, fields[1] = undef pointer (was null). Fixes 23 DXC validator crashes.

  • DXIL: struct return decomposition — Multi-output shaders (struct returns with multiple @location fields) now decompose into per-field GEP + scalar load + storeOutput.

  • DXIL: binary op vector scalarization — Vector binary operations decomposed into per-component scalar ops with scalar-vector broadcast.

  • DXIL: global variable allocas — Non-resource globals (workgroup, private) get proper alloca pointers instead of placeholder values.

  • DXIL: array/matrix CBV loads, dynamic GEP, UAV constant-index fix — Matrix CBV loads (one cbufferLoadLegacy per column), array local variable allocas, dynamic Access with GEP, UAV constant-index access fix. 3 more shaders pass DXC.

  • DXIL: SRV/UAV direct loads, dynamic CBV index, array output decomposition — SRV/UAV loads routed to bufferLoad (not LLVM load), dynamic CBV index with stride arithmetic, ZeroValue/Compose dynamic array access, array-typed builtin outputs.

  • DXIL: typed undef for bufferStore, deep UAV chains, entry-block allocas — Float bufferStore uses typed undef (f32, was i32). Deep UAV pointer chains (struct-wrapped arrays, nested Access). All allocas in entry block (was lazy).

  • DXIL: mesh shader intrinsics (SM 6.5) — SetMeshOutputCounts (168), StoreVertexOutput (171), StorePrimitiveOutput (172), EmitIndices (169). All 4 mesh shaders pass DXC (9 entry points). PSG1 primitive signatures.

  • DXIL: struct-typed entry point arguments — Fragment/vertex shaders with struct inputs (e.g., vertex: VertexOutput) now correctly load per-member with row tracking. Fixes 14 additional shaders.

  • DXIL: helper function emission with per-function value ID isolation — Each helper function gets independent value ID space. collectCalledFunctions() pre-scans entry point.

  • DXIL: switch statements — Cascading icmp eq + conditional branches with merge block.

  • DXIL: 17 pack/unpack math functions — pack4x8snorm/unorm, unpack4x8snorm/unorm, pack2x16float/snorm/unorm, unpack variants, pack4xI8/U8/clamp, unpack4xI8/U8.

  • DXIL: matrix operations — matvec, vecmat, matmat, mat+/-mat, matscalar, transpose. All scalarized to component-wise DXIL instructions (dot products for multiply).

  • DXIL: workgroup atomics — LLVM atomicrmw/cmpxchg for workgroup variables (add, sub, and, or, xor, min, max, xchg, compare-exchange).

  • DXIL: vector select scalarization, math broadcast, ExprArrayLength, helper vector returns, struct member component stores, atomic compare-exchange result struct, abstract literals, matrix alloca, refract/modf/frexp/quantizeF16 — systematic fixes across emitter.

  • DXIL: atomic type width support (i32/i64/f32) — workgroup atomics now use correct type width instead of hardcoded i32. Fixes f32/i64 atomic shaders.

  • DXIL: ExprOverride + ProcessOverrides — pipeline override constants now compile. DXIL test harness processes overrides (same as SPIR-V/HLSL).

  • DXIL: matrix column extraction, f16 constant encoding — AccessIndex on matrix returns full column vector (was single scalar). F16 constants use IEEE 754 half-precision.

  • DXIL: typed zero constants, CBV/UAV resource pass-through, struct field loads — f16/i64 zero constants use correct types. CBV resource AccessIndex pass-through. UAV struct field byte offset + scalar type resolution. Multi-register struct loads.

  • DXIL: FRem lowering — LLVM FRem lowered to a - b * floor(a/b) (DXC rejects FRem).

  • DXIL: texture intrinsicsdx.op.getDimensions (72) for textureDimensions/numLevels/numSamples/numLayers, dx.op.textureLoad (66) for imageLoad, dx.op.textureStore (67) for imageStore.

  • DXIL: complex UAV access chains, array load/store decomposition, workgroup uniform load — Matrix column+component UAV access, multi-member struct arrays, 512-element array copy, StmtWorkGroupUniformLoad as barrier+load+barrier pattern.

  • DXIL: 8 texture sampling intrinsics — OpSample(60), OpSampleBias(61), OpSampleLevel(62), OpSampleGrad(63), OpSampleCmp(64), OpSampleCmpLevelZero(65), OpTextureGather(73), OpTextureGatherCmp(74). Previously only OpSample.

  • DXIL: binding array dynamic handlesdx.op.createHandle with dynamic index for binding_array<T> resources. Both ExprAccess and ExprAccessIndex paths.

  • DXIL: NumWorkGroups via synthetic CBV$Globals CBV with cbufferLoadLegacy for compute dispatch dimensions (DXIL has no intrinsic, matches DXC approach).

  • DXIL: ray query intrinsics (SM 6.5) — 35 new opcodes (178-212): allocateRayQuery, traceRayInline, proceed, candidateType, committedStatus, all intersection getters. RayIntersection struct (34 components). Auto SM upgrade to 6.5.

  • DXIL: image atomics — StmtImageAtomic via dx.op.atomicBinOp/atomicCompareExchange with texture handles and spatial coordinates.

  • DXIL: wave/subgroup operations (SM 6.0+) — 13 wave intrinsics: waveGetLaneIndex(111), waveGetLaneCount(112), waveAnyTrue(113), waveAllTrue(114), waveActiveBallot(116), waveReadLaneAt(117), waveReadLaneFirst(118), waveActiveOp(119), waveActiveBit(120), wavePrefixOp(121), quadReadLaneAt(122), quadOp(123).

  • DXIL: DXC dumpbin validation163/163 testable shaders pass DXC dumpbin (100%). Zero val_fail. Zero compile_fail. 2 expected fail (no entry points). ALL 6 backends at 100%. World's first Pure Go DXIL generator at full validation.

Fixed (other backends)

  • *SPIR-V: OpIMul for vec4<f32>f32 after unpack4x8unorm (BUG-SPIRV-003)resolveMathType() missing return types for 10 pack/unpack functions. unpack4x8unorm(u32) returned u32 instead of vec4<f32>, causing OpIMul instead of OpVectorTimesScalar. Fixes gg#252, naga#61.

Changed

  • SPIR-V Rust reference: allow-list for intentional divergences — Unified allow-list across all backends (SPIR-V/MSL/HLSL/GLSL) for shaders where our output intentionally differs from Rust naga (Workgroup layout-free types per VUID-StandaloneSpirv-None-10684, no-compact-pass for entry-point-less shaders). 0 fail across all backends.

  • SPIR-V validation: 165/165 — Added ptr-deref-test shader (+1 from v0.17.2).

[0.17.2] - 2026-04-10

Fixed

  • SPIR-V: OpFunctionCall type mismatch with Workgroup values (BUG-SPIRV-002) — Regression from v0.17.1 Workgroup fix. Values loaded from Workgroup variables had layout-free types that propagated into OpFunctionCall arguments, causing type mismatch with decorated function parameters. Fix: insert OpCopyLogical immediately after every OpLoad from Workgroup (in emitLoad, emitAccess, emitAccessIndex), converting layout-free → decorated at the load point. Guard maybeCopyLogicalForStore against double-conversion when value is already decorated. Validated: 164/164 naga shaders + 23/23 gg GPU shaders pass spirv-val v2026.1. (gogpu/wgpu#134, reported by @SideFx via gogpu/ui#67)

[0.17.1] - 2026-04-08

Fixed

  • SPIR-V: Workgroup ArrayStride violation (VUID-StandaloneSpirv-None-10684) — SPIR-V backend emitted ArrayStride decoration on array types used in Workgroup storage class, which is forbidden without SPV_KHR_workgroup_memory_explicit_layout. Intel Vulkan silently accepted this, but Qualcomm Adreno correctly rejected it — causing invisible text on Snapdragon X Elite. Fix: emit separate layout-free type declarations for Workgroup variables (arrays without ArrayStride, structs without Offset/MatrixStride). Uses OpCopyLogical (SPIR-V 1.4+) to bridge Workgroup ↔ Storage type mismatches. Constant deduplication added to prevent duplicate OpConstant IDs that break OpCopyLogical type equivalence. Rust naga has the same bug (gfx-rs/wgpu#7696, fix PR #9295 still open). (gogpu/wgpu#134, reported by @SideFx via gogpu/ui#67)

[0.17.0] - 2026-04-06

Added

  • DXIL backend (experimental) — First Pure Go DXIL generator. Direct DXIL generation from naga IR for DX12 Shader Model 6.0. Eliminates FXC/DXC compiler dependency entirely. Zero CGO, zero external dependencies. Verified: 2400+ frames at 60 FPS on D3D12 (Intel Iris Xe). Rust naga has not implemented this (open issue since 2020). 190 tests, ~12.5K LOC. Public API: dxil.Compile(), dxil.DefaultOptions().
    • LLVM 3.7 bitcode writer with VBR encoding, nested blocks, abbreviation records
    • DXBC container with DXIL, ISG1, OSG1, PSV0, SFI0, HASH parts
    • BYPASS hash (Agility SDK 1.615+, January 2025) — no dxil.dll signing needed
    • Full expression lowering: literals, binary/unary ops, compose, access, splat, select, load/store, swizzle, derivatives, relational, type casts (10 LLVM cast opcodes), math intrinsics (30+ functions including min/max/clamp/dot/ cross/mix/fma/smoothstep/pow/length/distance/normalize)
    • Control flow: LLVM basic blocks with br/br_cond, loops with back edges, break/continue via loop context stack, BreakIf support
    • Local variables: alloca + load + store with proper bitcode serialization
    • Resource bindings: dx.op.createHandle(57) for CBV/SRV/Sampler, dx.op.sample(60) for texture sampling, dx.resources metadata
    • I/O signatures: ISG1/OSG1 with semantic mapping (SV_Position, TEXCOORD, SV_Target, SV_VertexID, SV_InstanceID, SV_IsFrontFace, SV_Depth)
    • Pipeline state validation: PSV0 with runtime info, shader stage, wave lanes
    • Vector scalarization: DXIL has no native vectors — vec4 = 4 separate values, per-component tracking via exprComponents map
    • dx.op intrinsic system: overload-typed function declarations, lazy caching
    • Value ID remapping: emitter local IDs → serializer global numbering via finalize()
    • Error hardening: unsupported features return errors (not silent skip)

[0.16.6] - 2026-04-05

Performance

  • TypeRegistry zero-alloc lookups — Refactored normalizeType() to appendTypeKey() using keyBuf []byte with inline string(keyBuf) map index (Go compiler optimization). Eliminates heap string allocation per type lookup. large_pbr: -59 allocs/op.
  • Lexer token preallocation — Token slice estimate len(source)/4 (was /6). Prevents slice regrowth for typical shaders.
  • SPIR-V Backend reuseBackend.Reset() + ModuleBuilder.Reset() clear all state without deallocating (Go 1.21+ clear() keeps map capacity). Compile() calls Reset() internally. Small shaders: 4.9× fewer allocs (68→14), 14× fewer bytes (17KB→1.2KB). Reuse benchmarks included.
  • Lowerer/Parser pre-sizing — Expression, statement, declaration slices pre-allocated based on AST size estimates. TypeRegistry capacity hints.
  • Overall: 594 → 562 allocs/op (-5.4%). SPIR-V reuse: 68 → 14 allocs (4.9×).

[0.16.5] - 2026-04-05

Changed

  • Dead code removal — Removed 3 unused functions found by full codebase audit (398 symbols scanned across 6 packages): flattenBinding (glsl), concretizeTypeInner (ir/compact), resolveLiteralTypeInner (ir/process_overrides).

[0.16.4] - 2026-04-04

Fixed

  • GLSL: per-element loop for workgroup array zero-init — Same fix as HLSL: arrays >= 256 elements use for loop instead of inline constructor list (Type[256](elem, elem, ... ×256) = 12KB single line). Prevents GL driver compiler slowdown and potential crashes on mobile/embedded GL drivers.

[0.16.3] - 2026-04-04

Fixed

  • Per-element loop for workgroup array zero-init — FXC hangs 22 seconds on (StructType[256])0 bulk assign. Replaced with per-element for loop: for (uint _naga_zi_0 = 0u; ...) { arr[_naga_zi_0] = (ElementType)0; }. Compilation time: 22s → 68ms. Handles nested arrays recursively. First implementation in the industry to fix this — Rust naga, ANGLE, Dawn all have the same bug or workaround it differently. See docs/dev/research/HLSL-ZERO-INIT-FXC-HANG-RESEARCH.md.
  • Defaults changed to match RustForceLoopBounding: true, RestrictIndexing: true (were both false).

HLSL backend: parity fixes

  • Dot4I8Packed/Dot4U8Packed — dot4add intrinsics (SM 6.4+) with polyfill. (3 shaders)
  • Pack4xI8/U8/Clamp + Unpack4xI8/U8 — shift+mask polyfills. (2 shaders)
  • LiteralF16 — float16_t with h suffix. (2 shaders)
  • ValuePointerType — storage access stride computation. (1 shader)
  • StmtImageAtomic — InterlockedXxx on RWTexture coordinates. (2 shaders)
  • ExprWorkGroupUniformLoadResult — barrier + typed load + barrier. (1 shader)
  • ExprBinary in global scope — const binary expressions. (1 shader)
  • Float exponent formatting — removed spurious '+' in exponents. (1 shader)
  • naga_extractBits/naga_insertBits — per-type overloads matching Rust. (1 shader)
  • Pack/Unpack 2x16/4x8 snorm/unorm/float — all 10 inline polyfills. (1 shader)
  • f16 infrastructure — NagaConstants struct, matrix padding, storage Store patterns. (1 shader)
  • Demand-driven StorageLoadHelpers — match Rust approach. (1 shader)

[0.16.1] - 2026-04-04

Fixed

SPIR-V binary validation: 29 → 0 failures, 16 → 0 compile failures (164/164 pass)

All fixes validated against Rust naga reference (naga/src/back/spv/). SPIR-V TestRustReference: 4/87 → 90/93 (3 minor structural gaps, not bugs).

  • Depth texture sampling — Dref sampling uses scalar f32 result type; non-Dref depth sampling uses vec4 then CompositeExtract for first component. Matches Rust naga image.rs:823-837. (4 shaders)

  • Function pointer argumentsisPointerExpression now recognizes pointer-typed FunctionArgument params. emitCall uses emitPointerExpression for pointer parameters. (1 shader: access)

  • Integer dot product — OpDot only works on float vectors. Integer dot products now use manual expansion (CompositeExtract + IMul + IAdd). Matches Rust naga write_dot_product. (1 shader)

  • FMix scalar-to-vector splatting — When FMix selector is scalar but operands are vectors, splat scalar via OpCompositeConstruct. Matches Rust naga Mix handling. (2 shaders)

  • Integer vec×scalar multiply — OpIMul requires matching types. Scalar operand splatted to vector via OpCompositeConstruct before IMul. Matches Rust naga write_vector_scalar_mult. (1 shader)

  • Matrix add/sub decomposition — SPIR-V FAdd/FSub don't work on matrix types. Decomposed into column-wise vector operations. Matches Rust naga write_matrix_matrix_column_op. (1 shader)

  • VectorTimesMatrix result type — Result vector size must equal matrix column count, not left vector size. Also fixed MatrixTimesMatrix for non-square matrices. (1 shader)

  • ImageQuerySize dimensionality — Result type now matches image dimensionality (2D→vec2, 3D→vec3, etc.) instead of always vec3. ImageQueryNumLayers uses OpImageQuerySizeLod + CompositeExtract. Matches Rust naga image.rs:1142-1210. (1 shader)

  • Bitcast width conversion — Width-changing integer casts now use OpUConvert/OpSConvert instead of OpBitcast (which requires equal bit width). Matches Rust naga block.rs:2061-2066. (1 shader + 5 bonus)

  • Atomic int64 result types — Atomic operations on int64 now use correct 64-bit scalar result type instead of hardcoded 32-bit. (2 shaders)

  • AtomicCompareExchange struct — OpAtomicCompareExchange returns scalar, not struct. Now constructs {old_value, exchanged} struct via OpCompositeConstruct with OpIEqual for the exchanged bool. Matches Rust naga block.rs:3414+. (2 shaders + 1 bonus)

  • ControlBarrier SubgroupMemory — Fixed MemorySemanticsSubgroupMemory constant (0x80, not 0x200). Barrier emission now matches Rust naga write_control_barrier. (1 shader)

  • ConstOffset image operand — Changed from ConstOffsets (0x40, Gather-only) to ConstOffset (0x08, valid for all sampling). (partial fix)

  • Image store coordinates — Integer coordinates now use proper bitcast instead of float conversion. (2 shaders)

  • External texture handling — ImageClassExternal uses float sampled type and OpImageQuerySizeLod for queries. (1 shader)

  • ImageGatherExtended capability — Removed spurious emission for ConstOffset operands. Only needed for dynamic Offset. (1 shader)

  • spirv-val target env — Uses spv1.6 for general SPIR-V correctness validation, with --uniform-buffer-standard-layout for std430 matrix stride.

  • Pointer function arguments — Copy-in/copy-out (spill) pattern for passing OpAccessChain results to OpFunctionCall. SPIR-V requires memory object declarations (OpVariable/OpFunctionParameter) as pointer arguments. (3 shaders)

  • Image Lod float conversion — SPIR-V ExplicitLod requires float Lod operand. Integer level converted via OpConvertSToF/OpConvertUToF. (2 shaders)

  • Binding arrays RuntimeArray — Proper wrapping in Block-decorated struct. (1 shader)

  • Mesh shaders — MeshEXT/TaskEXT execution models, SPV_EXT_mesh_shader extension, OutputVertices/OutputPrimitivesEXT execution modes. (4 shaders)

  • Pipeline overrides (OpSpecConstant) — ExprOverride expressions emitted as OpSpecConstant/OpSpecConstantComposite with SpecId decoration. (3 shaders)

  • Image atomics (OpImageTexelPointer) — StmtImageAtomic emits OpImageTexelPointer then standard atomic ops on the texel pointer. (2 shaders)

  • WorkGroupUniformLoad — Emitted as ControlBarrier + OpLoad + ControlBarrier matching Rust naga's pattern. (1 shader)

  • Pack4xI8/U8/Clamp + Unpack4xI8/U8 — Shift+mask pack/unpack polyfills for 4×8-bit integer packing. (5 shaders)

  • ProcessOverrides remap fixes — StmtAtomic.Compare and StmtRayQuery fields now correctly remapped during override processing. (2 shaders)

Known Issues

  • 0 SPIR-V binary validation failures. All 164 shaders compile and pass spirv-val.

  • 3 SPIR-V structural differences from Rust reference (Int8 capability, decoration counts) — not validation or compilation issues.

[0.16.0] - 2026-04-02

Added

  • GLSL: TextureMappings in TranslationInfo — Reports texture-sampler binding pairs for combined sampler2D, enabling GLES HAL SamplerBindMap construction. Matches Rust naga ReflectionInfo pattern.

Fixed

SPIR-V binary validation: 63 → 29 failures (−34 fixed)

All fixes validated against Rust naga reference (naga/src/back/spv/).

  • SPV_KHR_storage_buffer_storage_class extension — StorageBuffer storage class requires this extension for SPIR-V < 1.3 (default version is 1.1). Matches Rust naga writer.rs:2580. (5 shaders)

  • Workgroup ArrayStride decoration — Workgroup variables must not have explicit layout decorations per VUID-StandaloneSpirv-None-10684. Added layout-free type emission for Workgroup address space. (1 shader)

  • OpLoad on runtime-sized arrays — SPIR-V forbids loading entire runtime-sized arrays. Global variables containing runtime arrays now return pointers instead of loaded values. (2 shaders)

  • OpImageFetch result type — Hardcoded vec4 replaced with correct type derived from image SampledKind. (1 shader)

  • SPV_KHR_multiview, SPV_KHR_fragment_shader_barycentric, SPV_KHR_integer_dot_product extensions — Added when corresponding capabilities are used. (6 shaders)

  • CapabilityImageGatherExtended — Added for textureGather with offset. (2 shaders)

  • SampleMask BuiltIn — Wrapped in array<u32,1> per SPIR-V spec. (1 shader)

  • SPIR-V version 1.3 requirement — RequireVersion(1.3) for subgroup/ GroupNonUniform capabilities. (1 shader)

  • Duplicate OpTypeSampler — Cached to emit exactly once. (3 shaders)

  • ClipDistance/PrimitiveIndex builtins — Added to builtinToSPIRV mapping. (1 shader)

  • OpAtomicFAddEXT (6035) — Float32 atomics use OpAtomicFAddEXT instead of OpAtomicIAdd. Atomic scalar resolution returns kind+width for int64. (4 shaders)

  • AtomicCompareExchange unequal semantics — Unequal operand uses Acquire (not AcquireRelease) per VUID-StandaloneSpirv-UnequalMemorySemantics-10875. (3 shaders)

  • ExecutionModeDepthReplacing — Fragment shaders writing FragDepth now emit DepthReplacing per VUID-FragDepth-FragDepth-04216. (2 shaders)

  • SPV_KHR_integer_dot_product version comparison — Fixed 0x10006 → 0x10600 (SPIR-V version encoding is major<<16|minor<<8). (2 shaders)

GLSL

  • Depth texture combined sampler naming — When a depth texture is used with both regular and comparison samplers, the combined sampler names were swapped due to non-deterministic Go map iteration. Fixed with deterministic sorting. (1 shader)

Known Issues

  • 29 SPIR-V binary validation failures at release time. Fixed in Unreleased section above (29 → 6 remaining).

[0.15.2] - 2026-04-01

Fixed

  • HLSL: revert direct sampler kostyl, restore heap pattern — Reverted the direct sampler register binding from v0.15.1. DX12 HAL now implements proper sampler heap.

[0.15.1] - 2026-03-31

Fixed

  • HLSL: revert direct sampler register binding mode — Samplers always use the sampler heap indirection pattern (nagaSamplerHeap[indexBuffer[N]]), matching Rust wgpu-hal architecture. The DX12 HAL now properly implements the global sampler heap with per-bind-group sampler index buffers and provides SamplerBufferBindingMap to naga during shader compilation.

0.15.0 - 2026-03-30

Highlights

  • ALL 5 backends at 100% Rust naga parity — complete exact output match
  • IR Reference: 144/144 (100%) — complete structural match with Rust naga on ALL shaders
  • SPIR-V Backend: 87/87 (100%) — exact output match with Rust naga (was 40/87)
  • MSL Backend: 91/91 (100%) — exact output match with Rust naga
  • GLSL Backend: 68/68 (100%) — exact output match with Rust naga
  • HLSL Backend: 58/58 (100%) — exact output match with Rust naga
  • ir/ test coverage: 82% — up from 24%, with 148 new unit tests
  • 994 golden output files across 4 backends, 164 test shaders
  • Quake 1 renders on gogpu/wgpu Pure Go Vulkan backend (gogpu#157)

Added

IR Level

  • needsPreEmit auto-interruptaddExpression automatically interrupts emitter for Literal, Constant, ZeroValue, GlobalVariable, FunctionArgument, LocalVariable, Override expressions, matching Rust naga's constant_evaluator::append_expr
  • Splat in GlobalExpressions — single-arg vector constructors produce ExprSplat GE
  • ZeroValue in GlobalExpressions — zero-arg constructors produce ExprZeroValue GE
  • Swizzle const-foldvec4(vec2(1,2), vec2(3,4)).wzyx fully evaluated at compile time
  • dot4I8Packed/dot4U8Packed const-fold — packed dot product evaluated at compile time
  • Abstract composite constant inlineABSTRACT_ARRAY[i] inlines array literals
  • Binary const eval → GEvec2(1.0f) + vec2(3.0f, 4.0f) produces GlobalExpressions directly
  • Constant alias GE sharingconst ALIAS = ORIGINAL reuses GE handle
  • Void call emitter restart — proper emitter state after void function calls
  • Matrix column grouping in nested constructor GlobalExpressions
  • As conversion for scalar type mismatchvec4f(u32_value) inserts ExprAs convert

SPIR-V Backend

  • 87/87 (100%) Rust naga parity — exact binary output match on all reference shaders
  • Integer div/mod safety wrappersnaga_div/naga_mod helper functions prevent division by zero and i32 MIN/-1 overflow, matching Rust naga behavior
  • Image bounds checking — Restrict and ReadZeroSkipWrite policies with coordinate clamping for texture load/store operations
  • Ray query helper functions — 6 helper functions per ray query (initialize, proceed, terminate, committed/candidate intersection getters)
  • Force loop bounding — iteration counter prevents infinite loops on malformed shaders
  • Workgroup zero-init polyfill — zero-initializes workgroup memory at entry point start
  • 20+ new capabilities — ClipDistance, Geometry, GroupNonUniform, Float16 storage, AtomicFloat32AddEXT, StorageImageExtendedFormats, SampleMaskPostDepthCoverage, SubgroupBallotKHR, and more
  • NonWritable/Flat/NonUniform decorations — correct propagation for binding arrays and storage buffer access
  • OpLoad dereferencePointerType fix — correct type resolution for pointer loads
  • Float16 constant emission — proper OpConstant for f16 values
  • ModfStruct/FrexpStruct fix — correct result struct types for decomposition functions
  • Saturate FClamp fixsaturate() emits FClamp with 0.0/1.0 bounds
  • f16 I/O polyfill — bitcast-based conversion for f16 entry point interface variables
  • Composite spilling — by-value dynamic indexing spills composites to local variables
  • SSA entry point struct args — correct argument handling for entry point structs
  • Capability-aware dot4 polyfill — software emulation when DotProduct unavailable
  • Entry point interface vars — SPIR-V 1.4+ globals, ForcePointSize decoration

GLSL Backend

  • 68/68 (100%) Rust naga parity — exact text output match on all reference shaders
  • dominates_global_use reachability — correct global variable emission per entry point
  • ProcessOverrides — pipeline constant specialization for GLSL output
  • Image bounds checking — coordinate clamping for texture operations

MSL Backend

  • Vertex Pulling Transform — complete implementation in msl/vertex_pulling.go: _mslBufferSizes struct, 42 vertex format unpacking functions, buffer type structs, bounds-checked byte unpacking from raw vertex buffers
  • External Texture SupportNagaExternalTextureWrapper struct, multi-plane YUV sampling (nagaTextureSampleBaseClampToEdge), texture load (nagaTextureLoadExternal), dimensions query (nagaTextureDimensionsExternal), transfer function color space conversion
  • TOML inline table parsingmsl_pipeline = { key = val } format support

WGSL Frontend

  • Ray query supportacceleration_structure, ray_query types, RayDesc/ RayIntersection predeclared structs, RAY_FLAG_* constants, 7 ray query builtins. Full SPIR-V + HLSL + MSL backend emission.
  • Subgroup operationssubgroupBallot, subgroupAdd/Mul/Min/Max/And/Or/Xor, subgroupBroadcast/First, subgroupShuffle*, quadSwap*, subgroupBarrier. Full SPIR-V backend with correct capabilities. HLSL/MSL/GLSL placeholders.
  • Vector const-exprs — component-wise binary operations at module scope (const X = vec2(1.0) + vec2(3.0, 4.0)). Splat expansion for scalar→vector.
  • Override declarations@id(N) override name: type = default;
  • f16/i64/u64/f64 scalar typesenable f16/int64; directives, literal suffixes (1.0h, 42li, 42lu, 1.0lf), type constructors
  • break if syntax — continuing block break if condition;
  • Type aliasesalias FVec3 = vec3<f32>; with constructor support
  • const_assert declarations — compile-time assertions (evaluated as no-op)
  • workgroupUniformLoad builtin — maps to ir.StmtWorkGroupUniformLoad
  • atomicCompareExchange result struct.old_value/.exchanged member access
  • Template list edge cases — trailing commas, >= disambiguation
  • diagnostic directive — top-level diagnostic(...) skipping
  • 164 WGSL test shaders — 144/144 Rust reference (100%) + 20 custom

Changed

SPIR-V Backend

  • Block Ownership Model (NAGA-ARCH-001) — refactored function emission from flat instruction list to block-based architecture matching Rust naga. Each SPIR-V basic block is now a first-class Block struct consumed by FunctionBuilder. LoopContext passed by value for isolated nested loop contexts. Eliminates loopStack/breakStack mutable state and blockEndsWithTerminator() post-hoc checks. Produces identical SPIR-V output — zero behavioral changes.

Performance

GLSL Backend

  • Dead code elimination via entry-point reachability (GLSL-001) — GLSL writer now walks the call graph from the target entry point and only emits reachable types, constants, globals, and functions. SDF fragment shader output reduced from 639KB to target <50KB. Fixes 5-10 second startup delay on GLES backend (naga#42).

Fixed

SPIR-V Backend

  • Workgroup Offset decoration removed (SPIRV-001)emitTypeNoLayout() now handles struct types by creating separate type IDs without Offset, ColMajor, or MatrixStride member decorations. Fixes Vulkan validation error VUID-StandaloneSpirv-None-10684 on all Vello compute shaders. Matches Rust naga global_needs_wrapper() pattern.

MSL Backend

  • Workgroup variables emitted as entry point parameters (MSL-002)var<workgroup> globals now appear as threadgroup T& name parameters in compute entry points. Previously skipped because they have no resource binding. Fixes undeclared identifier 'sh_scratch' on macOS Metal.
  • Barrier calls fully namespace-qualifiedthreadgroup_barrier() and mem_flags now prefixed with metal::. Fixes undeclared identifier 'mem_flags' on strict Metal compilers.
  • Runtime-sized array typedef — dynamic arrays (array<T>) now emit typedef T name[1]; in MSL output. Fixes unknown type name 'type_6' for storage buffer parameters.

0.14.8 - 2026-03-16

Fixed

GLSL Backend

  • Bind group collision in GLSL output@group(0) @binding(0) and @group(1) @binding(0) both generated layout(binding = 0), causing uniform shadowing on GLES. SDF viewport uniform was overwritten by clip uniform, making all SDF shapes invisible. Now flattens to unique GL binding points: group * 16 + binding + base.

0.14.7 - 2026-03-15

Fixed

MSL Backend

  • Multi-group binding index collision@group(0) @binding(0) and @group(1) @binding(0) both mapped to Metal [[buffer(0)]], causing shader compilation failure on macOS. The MSL backend now assigns sequential per-type indices (buffer, texture, sampler) across all bind groups sorted by (group, binding), matching the Rust wgpu-hal approach. When PerEntryPointMap provides explicit mappings, those take priority. (gogpu/gg#209)

0.14.6 - 2026-03-06

Fixed

MSL Backend

  • Pass-through globals for helper functions — textures, samplers, uniforms, and storage buffers used by non-entry-point functions are now passed as extra parameters and arguments; previously MSL helper functions could not access entry point resource bindings, causing undeclared identifier errors (e.g., msdf_atlas, msdf_sampler, sh_scratch) for any shader with helper functions referencing global resources (gogpu/ui#23)

0.14.5 - 2026-03-04

Fixed

MSL Backend

  • Buffer parameters use references (&) instead of pointers (*) — buffer parameters now generate constant Uniforms& u [[buffer(0)]] (reference) instead of constant Uniforms* u [[buffer(0)]] (pointer); pointer syntax required -> or (*u). for member access while the expression writer generates . access, causing Metal compilation errors on Apple Silicon (gogpu/ui#23)

0.14.4 - 2026-03-01

Fixed

MSL Backend

  • Vertex [[stage_in]] for struct-typed arguments — vertex shaders with struct-typed inputs (e.g., fn vs_main(in: VertexInput)) now correctly generate a synthesized _Input struct with [[attribute(N)]] members and [[stage_in]] parameter; previously only fragment stage was handled, causing undefined in_ reference (gogpu/ui#23)
  • metal::discard_fragment() namespacediscard_fragment() now emits with required metal:: namespace prefix; bare call was rejected by Metal shader compiler

0.14.3 - 2026-02-25

Fixed

SPIR-V Backend

  • Deferred store for multiple call results — Variables initialized from expressions containing multiple function call results now correctly emit deferred OpStore instructions for each intermediate result
  • Deferred store for var x = atomicOp() — Atomic operation results used in variable initialization now correctly generate deferred stores instead of losing the value (NAGA-SPV-006)
  • OpLogicalEqual for bool comparisons — Boolean equality expressions now emit correct OpLogicalEqual opcode; transitive deferred stores propagate through boolean comparison chains
  • Atomic result type for atomic<i32> struct fields — Atomic operations on signed integer struct members now use correct OpTypeInt 32 1 result type instead of unsigned
  • Prologue var init splitting — Variable initializations that reference other local variables are now split from the function prologue into StmtStore at the declaration point, preventing use-before-definition in SPIR-V (NAGA-SPV-007)

0.14.2 - 2026-02-22

Added

Test Infrastructure

  • Golden snapshot test system (snapshot/) — compiles 30 WGSL shaders through all 4 backends (SPIR-V, GLSL, HLSL, MSL), compares output to ~118 stored golden files; supports UPDATE_GOLDEN=1 for regeneration
  • 20 new reference shaders — collatz, atomics, workgroup_memory, quad, vertex_colors, uniforms_mvp, multi_output, math_builtins, conversions, swizzle, expressions_complex, structs, arrays, matrices, let_and_var, loops_advanced, switch_advanced, texture_sample, texture_storage, pointers
  • WGSL error case tests (wgsl/wgsl_errors_test.go) — 76 test cases covering parse errors (39) and lowering errors (37): unknown types, unresolved identifiers, missing tokens, wrong builtin argument counts, reserved words
  • IR validator semantic tests (ir/validate_semantic_test.go) — 47 test functions covering type validation, constants/globals, entry points, functions, expressions, statements, and positive edge cases
  • SPIR-V capability tracking tests (spirv/capabilities_test.go) — 13 test functions verifying correct OpCapability emission: Shader always present, Float16/64, Int8/16/64, ImageQuery, DotProduct, no-emit-when-unused, no duplicates
  • SPIR-V disassembler for golden snapshots — extracted from cmd/spvdis/ into reusable test helper, produces diff-friendly text output

Fixed

SPIR-V Backend

  • Deterministic output — replaced map[int]uint32 with []uint32 slices for entry point interface variables and struct member extraction; Go map iteration order was causing non-reproducible SPIR-V binaries

GLSL Backend

  • Emit #extension GL_ARB_separate_shader_objects : enable for desktop GLSL < 4.10 — layout(location) on inter-stage varyings requires this extension; NVIDIA drivers reject generated code without it (#31)

0.14.1 - 2026-02-21

Fixed

HLSL Backend

  • row_major qualifier for matrix struct members in cbuffer/uniform blocks — DX12 M[i] column access was returning rows instead of columns, causing transposed transforms and invisible geometry
  • mul(right, left) argument reversal for row_major matrices — HLSL mul() semantics differ from WGSL * operator when layout changes
  • Unique entry point names — prevent HLSL duplicate function errors when multiple entry points reference the same function
  • Typed call results — function calls now use correct return type instead of void

GLSL Backend

  • Clear namedExpressions between function compilations — expression handle names from one WGSL function were leaking into subsequent functions, causing undeclared identifier errors in GLES shaders

0.14.0 - 2026-02-21

Major WGSL language coverage expansion: 15/15 Essential reference shaders from Rust naga test suite now compile to valid SPIR-V.

Added

WGSL Parser

  • Abstract type constructors without template parameters (vec3(1,2,3), mat2x2(...), array(...))
  • bitcast<T>(expr) template syntax with dedicated AST node
  • binding_array<T, N> type syntax
  • Float literal suffixes without decimal point (1f, 1h)
  • Switch statement: default as case selector, trailing commas, optional colon
  • Increment/decrement statements (i++, i--)

WGSL Lowerer

  • 48 predeclared short type aliases (vec3f, mat4x4f, vec2i, etc.)
  • Struct constructor syntax (StructName(field1, field2))
  • Pointer dereference on assignment LHS (*ptr = value)
  • _ discard identifier in assignments and let bindings
  • modf().fract/.whole and frexp().fract/.exp member access on builtin results
  • bitcast expression lowering
  • Constant expression evaluator for switch case selectors
  • dot4I8Packed / dot4U8Packed packed dot product builtins
  • textureGather, textureGatherCompare, textureSampleBaseClampToEdge
  • texture_depth_2d_array as non-parameterized type
  • textureSampleCompare / textureSampleCompareLevel
  • Global variable type inference from initializer
  • BindingArrayType for descriptor array types

SPIR-V Backend

  • OpTranspose (native SPIR-V opcode 84) with matrix type swap
  • Matrix type caching (prevents duplicate OpTypeMatrix)
  • 25 new math functions: bit manipulation (countOneBits, reverseBits, extractBits, insertBits, firstLeadingBit, firstTrailingBit, countLeadingZeros, countTrailingZeros), pack/unpack (4x8snorm, 4x8unorm, 2x16snorm, 2x16unorm, 2x16float), quantizeToF16
  • OpSDotKHR / OpUDotKHR with SPV_KHR_integer_dot_product extension
  • OpImageGather / OpImageDrefGather with component index
  • OpBitCount, OpBitReverse, OpBitFieldInsert, OpBitFieldSExtract, OpBitFieldUExtract
  • Pointer access chains on function arguments
  • findCallResultInTree extended to 12+ expression types
  • BindingArrayType emission (OpTypeArray/OpTypeRuntimeArray)
  • Identity conversion early return

IR

  • BindingArrayType struct for descriptor array types

Testing

  • 17 reference shader regression tests — 15 Essential + 2 bonus (skybox, water) from Rust naga test suite, embedded as string literals for CI compatibility
  • SPIR-V validation via spirv-val in CI

Fixed

WGSL Frontend

  • Compound assignment (+=, -=) on local variables — removed explicit ExprLoad
  • textureDimensions accepting 1 argument (texture only)
  • >> token splitting for nested template closing (ptr<function, vec3<f32>>)
  • Const with constructor expressions (const light = vec3<f32>(1,2,3))
  • Unary negation in constant expressions (const X = -0.1)
  • let bindings emitted at declaration point for SSA dominance correctness
  • Float literals without trailing digit (1. now parsed correctly)
  • Module-level constants with constructor initializers
  • Switch statement termination analysis for exhaustive matching
  • Trailing semicolons after closing braces no longer cause parse errors
  • Vector type inference from constructor arguments

SPIR-V Backend

  • OpIMul result type for scalar*vector promotion (was using scalar type instead of vector)
  • MatrixStride decoration for uniform matrix members
  • let variable semantics — emit OpLoad for let bindings (value semantics, not reference)
  • OpCapability ImageQuery emitted when using textureDimensions/textureNumLevels
  • Matrix multiply (OpMatrixTimesVector, OpVectorTimesMatrix, OpMatrixTimesMatrix) type handling
  • Deferred OpStore for variables initialized from complex expressions
  • Vector/scalar type promotion for add, subtract, modulo binary operations
  • select() builtin: float-to-bool condition conversion
  • Arrayed texture coordinate handling (array index as separate component)
  • OpImageWrite operand ordering for storage textures
  • Sampled type derived from storage format for OpTypeImage (was defaulting to float)
  • atomicStore / atomicLoad — correct SPIR-V opcode emission
  • Workgroup variable layout decorations (Offset, ArrayStride)
  • OpDecorate Block deduplication — no longer emits duplicate decorations
  • Loop continuing block codegen — correct back-edge and merge block structure
  • Uniform struct wrapping — storage/uniform buffer structs get correct member decorations
  • Vector type conversion in composite constructors

0.13.1 - 2026-02-17

SPIR-V OpArrayLength fix, comprehensive benchmarks, and compiler allocation optimization (−32%).

Fixed

  • SPIR-V OpArrayLength — Runtime-sized array length queries (arrayLength()) now emit correct OpArrayLength instruction. Handles both bare storage arrays (wrapped in synthetic struct) and struct member arrays. Fixes "unsupported expression kind: ExprArrayLength" crash in compute shaders with dynamic buffer sizes.

Added

  • Comprehensive compiler benchmarks — 68 benchmarks across all 7 packages (root, wgsl, spirv, glsl, hlsl, msl, ir) with ReportAllocs() and b.SetBytes() throughput metrics. Covers full pipeline (lex→parse→lower→validate→generate), cross-backend comparison, and per-stage isolation. Table-driven by shader complexity (small/medium/large).

Changed

  • Compiler allocation reduction (−32.3%) — Large PBR shader: 1384→937 allocs, 203KB→134KB. Word arena for SPIR-V instructions (eliminates per-instruction make()), shared InstructionBuilder with Reset(), package-level lookup tables in lowerer (eliminates 6 map allocations per compile including 66-entry getMathFunction), capacity hints in parser/lexer/backend. SPIR-V generate stage: −58.4% allocs. Lowerer bytes: −68.6%.

0.13.0 - 2026-02-15

GLSL backend improvements, HLSL struct entry point fix, and SPIR-V vector/scalar multiply and bool conversion fixes.

Added

GLSL Backend

  • UBO blocks for struct uniforms — Struct uniform variables now emit layout(std140) uniform BlockName { ... } blocks instead of bare uniform declarations
  • Entry point struct I/O — Vertex/fragment entry points with struct parameters and return types now correctly emit in/out declarations for each struct member

Testing

  • SPIR-V loop iteration regression tests — Tests verifying correct loop codegen
  • SPIR-V conditional call result regression tests — Tests for function calls inside if/return blocks

Fixed

GLSL Backend

  • Array syntax — Array declarations now use correct GLSL syntax (float name[3] instead of float[3] name)
  • Built-in mappings — WGSL builtins correctly mapped to GLSL equivalents (gl_Position, gl_VertexID, etc.)
  • Entry point generation — Correct void main() generation with proper layout qualifiers
  • Local variable initializers — Variables with initializers now emit correct GLSL initialization

HLSL Backend

  • Struct entry point arguments with member bindings — Entry points accepting struct parameters with @builtin/@location member bindings now correctly generate HLSL input structs with proper semantics

SPIR-V Backend

  • OpVectorTimesScalar — Vector-scalar multiplication now emits the dedicated OpVectorTimesScalar instruction instead of component-wise multiply
  • Opcode number corrections — Fixed incorrect opcode values for vector/scalar operations
  • Bool-to-float conversionf32(bool_expr) now generates correct OpSelect instead of failing with "unsupported conversion"
  • Variable initialization from expressions — Additional fixes for var x = expr; patterns
  • Math function argument handling — Improved argument ordering for smoothstep, clamp, select, abs, min, max

Changed

  • Removed unused test helper validateSPIRVBinaryBasic

0.12.1 - 2026-02-13

Hotfix: wire up HLSL codegen (was causing DPC_WATCHDOG_VIOLATION BSOD on DX12), complete all 93 WGSL built-in functions.

Added

WGSL Frontend

  • All 93 WGSL built-in functions — Complete coverage of the W3C WGSL specification
    • 14 math functions: modf, frexp, ldexp, inverse, quantizeToF16, outerProduct, pack4xI8/U8, pack4xI8Clamp/U8Clamp, unpack4xI8/U8
    • 9 derivative functions: dpdx, dpdy, fwidth + Coarse/Fine variants
    • 4 relational functions: all, any, isnan, isinf
    • arrayLength for runtime-sized arrays

Testing

  • HLSL end-to-end golden tests — 14 tests covering the full WGSL → HLSL pipeline
    • Triangle shader, vertex/fragment, compute, uniform buffers
    • Math functions, control flow (if/else, switch, loops), swizzle
    • Entry point deduplication, stub detection, semantic validation

Fixed

HLSL Backend

  • Wire up codegen — Connect implemented expression/statement/function codegen to the writer
    • Entry point functions now generate actual HLSL bodies instead of stub placeholders
    • Regular functions call writeFunctionBody() for complete code generation
    • Entry points use writeEntryPointWithIO() with proper I/O structs and semantics
    • Removed duplicate function emission (entry points were written twice)
  • Fragment output semantic — Fragment shader @location(N) now maps to SV_TargetN (was TEXCOORD0)
  • Builtin extraction@builtin(vertex_index) correctly extracted from input struct to local variable
  • Array syntax — HLSL arrays now use correct syntax:
    • Declarations: float2 name[3] instead of float2[3] name
    • Initializers: {elem1, elem2} instead of type[3](elem1, elem2)
  • Named expression ordering — Expression names cached AFTER writing initializer (prevents float x = x;)

0.12.0 - 2026-02-10

SPIR-V function call support and compute shader codegen improvements for GPU SDF pipeline.

Added

SPIR-V Backend

  • OpFunctionCall — Function call support for non-entry-point functions
    • Emits OpFunctionCall with correct result type and argument passing
    • Enables modular WGSL shaders with helper functions

Testing

  • SPIR-V codegen analysis tests for SDF compute shaders (~2000 LOC)
    • sdf_analysis_test.go — Validates SPIR-V output for SDF batch shader patterns
    • var_ifelse_test.go — Tests variable initialization and if/else codegen

Fixed

SPIR-V Backend

  • Compute shader codegen — Multiple fixes for real-world compute shader patterns
    • Fixed var initialization from expressions (was emitting zero instead of computed value)
    • Fixed hex literal suffix parsing (0xFFu now correctly parsed as u32)
    • Improved expression handling for complex compute shader workflows

WGSL Frontend

  • Hex literal suffixes0xFFu and 0xFFi now correctly parsed with type suffix

0.11.1 - 2026-02-09

Critical SPIR-V opcode corrections and compute shader fixes. Fixes incorrect code generation for logical operators, comparisons, shifts, and local variable initializers — all discovered during GPU SDF compute shader development.

Fixed

SPIR-V Backend

  • OpLogicalAnd opcode — Was 164 (OpLogicalEqual), corrected to 167 per SPIR-V spec
    • WGSL && compiled to boolean equality instead of logical AND
    • false && false incorrectly evaluated to true
    • Caused filled rectangles to render as outlines in compute shaders
  • Comparison opcodes swappedOpFOrdGreaterThan and OpFOrdLessThanEqual had each other's values
    • > behaved as <= and vice versa in float comparisons
  • Shift opcodes rotatedOpShiftLeftLogical, OpShiftRightLogical, OpShiftRightArithmetic corrected
    • Bit shift operations produced wrong results (e.g., RGBA channel packing)
  • Local variable initializersvar x: f32 = 0.0 now emits OpStore for the initial value
    • OpVariable was emitted without initializer; LocalVariable.Init field was ignored by backend
    • Variables started with undefined values, causing conditional stores to not propagate
  • Entry point interface — Only Input/Output variables listed per SPIR-V 1.3 spec
    • Uniform and StorageBuffer variables no longer incorrectly included
  • Void function termination — Explicit OpReturn for functions without terminator
  • Boolean literal type — Consistent type deduplication via emitScalarType
  • Runtime-sized arraysOpTypeRuntimeArray for storage buffer array<T> (was returning error)
  • Type conversionsf32(x), u32(y) now generate correct conversion opcodes instead of compose
  • Unsigned integer literals1u suffix correctly parsed as u32 (was always i32)
  • Array stride — Automatic ArrayStride decoration for runtime arrays (std430 layout)

WGSL Frontend

  • f16 pre-registration removed — No longer emits OpCapability Float16 in shaders that don't use f16
    • Fixes Vulkan validation errors on devices without shaderFloat16 support

Added

  • OpLogicalEqual (164), OpLogicalNotEqual (165) SPIR-V opcodes
  • OpConvertFToU, OpConvertFToS, OpConvertSToF, OpConvertUToF, OpBitcast conversion opcodes
  • nagac now targets SPIR-V 1.3 by default

0.11.0 - 2026-02-07

SPIR-V control flow fix and 55 new WGSL built-in math functions.

Fixed

SPIR-V Backend

  • if/else control flow — Fixed invalid SPIR-V causing GPU hang on all drivers
    • Root cause: blockEndsWithTerminator() didn't handle StmtBlock wrapper from WGSL lowerer
    • Reject branch wrapped in StmtBlock{} by lowerStatement() vs flat lowerBlock() for accept
    • Result: OpReturn emitted in reject block followed by spurious OpBranch (two terminators)
    • Merge block left without terminator — undefined behavior in structured control flow
    • Fix: Added StmtBlock and nested StmtIf handling to blockEndsWithTerminator()
    • Added OpUnreachable emission for merge blocks when both branches terminate
    • Fixed AddLabel()AddLabelWithID() for correct merge block targeting

Added

WGSL Built-in Functions (55 new, 67 total math functions)

  • Trigonometric: cosh, sinh, tanh, acos, asin, atan, atan2, asinh, acosh, atanh
  • Angle conversion: radians, degrees
  • Decomposition: ceil, floor, round, fract, trunc
  • Exponential: exp, exp2, log, log2, pow
  • Geometric: distance, faceForward, reflect, refract
  • Computational: sign, fma, mix, step, smoothstep, inverseSqrt, saturate
  • Matrix: transpose, determinant
  • Bit manipulation: countTrailingZeros, countLeadingZeros, countOneBits, reverseBits, extractBits, insertBits, firstTrailingBit, firstLeadingBit
  • Data packing: pack4x8snorm, pack4x8unorm, pack2x16snorm, pack2x16unorm, pack2x16float
  • Data unpacking: unpack4x8snorm, unpack4x8unorm, unpack2x16snorm, unpack2x16unorm, unpack2x16float
  • Selection: select(falseVal, trueVal, condition) — Component-wise selection

Testing

  • SPIR-V if/else control flow test — validates correct block termination
  • 55 new math function compilation tests — all functions verified end-to-end
  • select() function test with scalar and vector variants

0.10.0 - 2026-02-01

WGSL language features: local const, switch statements, and storage texture support.

Added

WGSL Language Features

  • Local const declarationsconst inside function bodies with compile-time evaluation
  • Switch statements — Full switch/case/default support with SPIR-V OpSwitch generation

Storage Texture Support

  • ir.StorageFormat — 50+ texture formats (rgba8unorm, r32float, etc.)
  • ir.StorageAccess — Access modes (read, write, read_write)
  • ImageType extension — StorageFormat and StorageAccess fields for storage textures
  • WGSL parsingtexture_storage_2d<rgba8unorm, write> syntax support
  • SPIR-V generation — Correct OpTypeImage format decorations for storage images

SPIR-V Backend

  • ImageFormat constants — All SPIR-V image format values
  • StorageFormatToImageFormat() — IR to SPIR-V format conversion

Changed

  • Texture type parsing refactored to parseTextureType() with proper dimension/class detection
  • Removed unused textureDim() function

0.9.0 - 2026-01-30

Sampler support, swizzle operations, and SPIR-V development tools.

Added

Sampler Types

  • sampler and sampler_comparison type support in WGSL lowerer
  • Lazy sampler type registration (prevents spurious OpTypeSampler in shaders without textures)

Swizzle Support

  • Full swizzle support via OpVectorShuffle in SPIR-V backend
  • Handles all WGSL swizzle patterns (e.g., .xyz, .rgba, .xxyy)

Struct Member Bindings

  • Binding field on StructMember for @builtin/@location on struct members
  • hasPositionBuiltin validation for vertex shader struct returns

SPIR-V Development Tools

  • cmd/spvdis — SPIR-V disassembler for debugging shader compilation (~480 LOC)
  • cmd/texture_compile — Texture shader compile tool for testing (~95 LOC)

Fixed

SPIR-V Backend

  • Block decoration — Added OpDecorate Block for uniform/storage/push_constant struct types
    • Required by Vulkan VUID-StandaloneSpirv-Uniform-06676
  • Member offset ordering — Fixed emitTypes() to run before struct member decorations
  • Pointer/value semantics — Fixed entry point parameter handling for correct SPIR-V output

WGSL Lowerer

  • Uniform buffer alignment — Proper alignment calculation for uniform buffer layout
  • Sampler registration — Samplers now created on-demand instead of pre-registered

Changed

  • Extracted expression ref helpers to separate functions (fixes funlen linter)
  • Removed unused code from SPIR-V backend

0.8.4 - 2026-01-10

Critical SPIR-V backend fix for Intel Vulkan driver compatibility.

Fixed

SPIR-V Backend

  • Instruction ordering — Fixed OpVariable declarations to appear before OpLoad instructions
    • SPIR-V spec requires all OpVariable at START of first block
    • Intel Iris Xe Graphics was rejecting shaders with incorrect ordering
    • Other drivers (NVIDIA, AMD) were more lenient but technically incorrect
  • Array access semantics — Added OpLoad after OpAccessChain
    • OpAccessChain returns pointer, but consumers expect values
    • Fixed undefined behavior in array/struct member access

Changed

  • Constant naming — Renamed BuiltInId constants to BuiltInID (Go naming convention)
    • BuiltInVertexIdBuiltInVertexID
    • BuiltInInstanceIdBuiltInInstanceID
    • BuiltInPrimitiveIdBuiltInPrimitiveID
    • BuiltInInvocationIdBuiltInInvocationID
    • BuiltInSampleIdBuiltInSampleID
    • BuiltInWorkgroupIdBuiltInWorkgroupID
    • BuiltInLocalInvocationIdBuiltInLocalInvocationID
    • BuiltInGlobalInvocationIdBuiltInGlobalInvocationID

0.8.3 - 2026-01-04

Critical MSL backend fix for vertex shader position output.

Fixed

MSL Backend

  • [[position]] attribute placement — Fixed to emit on struct member instead of function signature
    • MSL requires [[position]] on struct member, not on function return type
    • Before: vertex float4 vs_main(...) [[position]] { } (invalid MSL)
    • After: struct vs_main_Output { float4 member [[position]]; }; vertex vs_main_Output vs_main(...) { } (valid MSL)
    • Matches behavior of original Rust naga implementation
  • Simple type output structs — Now generates output struct for simple types with builtin bindings
  • Return statement handling — Fixed return for simple type output structs

0.8.2 - 2026-01-04

MSL backend improvements for ARM64 macOS triangle rendering.

Fixed

MSL Backend

  • Triangle shader compilation — Fixed entry point output struct handling for vertex shaders
  • Return attribute handling — Improved @builtin(position) and other return type attributes
  • Struct member emission — Fixed struct field ordering and attribute placement

Added

  • MSL backend tests — Comprehensive test coverage for struct handling and entry points
  • xcrun integration tests — Real Metal shader validation on macOS (skipped on other platforms)

Changed

  • Improved WGSL lowering for complex struct types
  • Better error messages for unsupported shader features

Contributors

  • @ppoage — ARM64 macOS fixes and testing

0.8.1 - 2025-12-29

Fixed

WGSL Lowering

  • clamp() built-in function — Added missing clamp to math function map
    • Root cause: getMathFunction() was missing clampir.MathClamp mapping
    • Caused "unknown function: clamp" error during shader compilation
    • Affected any WGSL shader using clamp(value, min, max)

Added

  • Comprehensive math function testsTestMathFunctions covering all 12 WGSL built-in math functions
    • Tests: abs, min, max, clamp, sin, cos, tan, sqrt, length, normalize, dot, cross
    • Verifies correct IR generation for each function

0.8.0 - 2025-12-28

Code quality improvements and SPIR-V backend bug fixes.

Fixed

SPIR-V Backend

  • sign() type checking — Now correctly uses SSign for signed integers vs FSign for floats
  • atomicMin/Max signed vs unsigned — Now correctly uses OpAtomicSMin/OpAtomicSMax for signed integers and OpAtomicUMin/OpAtomicUMax for unsigned

WGSL Frontend

  • Function resolution — Added pre-registration pass for forward function references
  • Return type attributes — Parser now correctly handles attributes on return types (e.g., @builtin(position))

Changed

  • Removed dead Write() method from SPIR-V writer
  • Removed unused module field from spirv.Writer struct
  • Code cleanup in hlsl/types.go nolint directives

0.7.0 - 2025-12-28

HLSL backend for DirectX shader compilation (~8.8K new LOC).

Added

HLSL Backend (DirectX)

  • hlsl/backend.go — Public API: Options, TranslationInfo, Compile()
    • DXC-first strategy (Shader Model 6.0+)
    • FXC compatibility mode (Shader Model 5.1)
    • Vertex, fragment, and compute shader support
  • hlsl/writer.go — HLSL code generation writer (~400 LOC)
  • hlsl/types.go — Type generation (~500 LOC)
    • Scalars: float, half, double, int, uint, bool
    • Vectors: float2, float3, float4, int*, uint*
    • Matrices: float2x2, float3x3, float4x4
    • Structs with HLSL semantics
  • hlsl/expressions.go — Expression code generation (~1100 LOC)
    • Literals, binary/unary operations
    • Access expressions (array, struct, swizzle)
    • 70+ HLSL intrinsic functions
    • Texture sampling: Sample, SampleLevel, SampleBias, SampleGrad, Gather
    • Derivatives: ddx, ddy, fwidth (coarse/fine variants)
  • hlsl/statements.go — Statement code generation (~600 LOC)
    • Control flow (if, switch, loop, for)
    • GPU barriers (GroupMemoryBarrier, DeviceMemoryBarrier, AllMemoryBarrier)
    • Return, discard, break, continue
  • hlsl/storage.go — Buffer and atomic operations (~500 LOC)
    • ByteAddressBuffer, RWByteAddressBuffer
    • StructuredBuffer, RWStructuredBuffer
    • cbuffer for uniforms
    • Atomics: InterlockedAdd, And, Or, Xor, Min, Max, Exchange, CompareExchange
  • hlsl/functions.go — Entry point generation (~500 LOC)
    • Input/output structs with HLSL semantics (SV_Position, TEXCOORD, SV_Target)
    • [numthreads(x,y,z)] for compute shaders
    • Helper functions for safe math operations
  • hlsl/keywords.go — HLSL reserved word escaping (200+ keywords)
  • hlsl/conv.go — IR to HLSL type/semantic conversion
  • hlsl/namer.go — Identifier mangling for HLSL compliance
  • hlsl/errors.go — HLSL-specific error types
  • hlsl/shader_model.go — Shader Model version handling
  • hlsl/bind_target.go — Register binding management (b/t/s/u)

Notes

  • HLSL backend enables DirectX GPU rendering on Windows
  • Supports DirectX 11 (SM 5.1) and DirectX 12 (SM 6.0+)
  • Total: ~8800 lines of code

0.6.0 - 2025-12-25

GLSL backend for OpenGL shader compilation (~2.8K new LOC).

Added

OpenGL Shading Language Backend

  • glsl/backend.go — Public API: Options, TranslationInfo, Compile()
    • GLSLVersion configuration (GLSL 330, 400, 450, ES 300, ES 310)
    • Vertex, fragment, and compute shader support
  • glsl/writer.go — GLSL code generation writer
  • glsl/types.go — Type generation (~300 LOC)
    • Scalars: float, int, uint, bool
    • Vectors: vec2, vec3, vec4, ivec*, uvec*, bvec*
    • Matrices: mat2, mat3, mat4, mat2x3, etc.
    • Arrays with fixed size
    • Textures: sampler2D, sampler3D, samplerCube
  • glsl/expressions.go — Expression code generation (~400 LOC)
    • Literals, binary/unary operations
    • Access expressions (array, struct, swizzle)
    • GLSL built-in function calls
  • glsl/statements.go — Statement code generation (~300 LOC)
    • Variable declarations
    • Control flow (if, for, while, loop)
    • Assignments and function calls
  • glsl/functions.go — Entry point generation (~400 LOC)
    • void main() with layout qualifiers
    • Vertex: layout(location = N) in/out
    • Fragment: layout(location = N) out
    • Compute: layout(local_size_x/y/z) workgroup size
  • glsl/keywords.go — GLSL reserved word escaping (183 keywords)
  • glsl/backend_test.go — Comprehensive unit tests (40+ tests)

Changed

  • README.md updated with GLSL backend documentation
  • Architecture section now includes GLSL backend structure

Notes

  • GLSL backend enables OpenGL GPU rendering on all platforms
  • Supports OpenGL 3.3+, OpenGL ES 3.0+
  • Required by wgpu GLES backend for Linux/embedded platforms

0.5.0 - 2025-12-23

MSL backend for Metal shader compilation (~3.6K new LOC).

Added

Metal Shading Language Backend

  • msl/backend.go — Public API: Options, TranslationInfo, Compile()
  • msl/writer.go — MSL code generation writer
  • msl/types.go — Type generation (~400 LOC)
    • Scalars: float, half, int, uint, bool
    • Vectors: float2, float3, float4, etc.
    • Matrices: float2x2, float3x3, float4x4
    • Arrays with fixed size
    • Textures: texture2d, texture3d, texturecube
    • Samplers: sampler
  • msl/expressions.go — Expression code generation (~600 LOC)
    • Literals, binary/unary operations
    • Access expressions (array, struct, swizzle)
    • Math function calls
  • msl/statements.go — Statement code generation (~350 LOC)
    • Variable declarations
    • Control flow (if, for, while, loop)
    • Assignments and function calls
  • msl/functions.go — Entry point generation (~500 LOC)
    • [[vertex]] for vertex shaders
    • [[fragment]] for fragment shaders
    • [[kernel]] for compute shaders
    • Stage input/output structs
  • msl/keywords.go — MSL/C++ reserved word escaping
  • msl/backend_test.go — Unit tests for MSL compilation

Changed

  • Pre-release check script now uses kolkov/racedetector (Pure Go, no CGO)
  • Updated ecosystem: gogpu v0.5.0 (macOS Cocoa), wgpu v0.6.0 (Metal backend)

Notes

  • MSL backend enables Metal GPU rendering on macOS/iOS
  • Required by wgpu v0.6.0 Metal backend

0.4.0 - 2025-12-12

Compute shader support with atomics, barriers, and developer experience improvements (~2K new LOC).

Added

Compute Shader Infrastructure

  • wgsl/parser.go — Access mode parsing for storage buffers
    • var<storage, read> — Read-only storage buffer
    • var<storage, read_write> — Read-write storage buffer
    • var<workgroup> — Workgroup shared memory
  • wgsl/lower.go — Workgroup size extraction from @workgroup_size attribute
  • ir/ir.goAtomicType for atomic<u32> and atomic<i32>

Atomic Operations

  • wgsl/lower.go — Atomic function lowering (~150 LOC)
    • atomicAdd(&ptr, value) — Atomic addition
    • atomicSub(&ptr, value) — Atomic subtraction
    • atomicMin(&ptr, value) — Atomic minimum
    • atomicMax(&ptr, value) — Atomic maximum
    • atomicAnd(&ptr, value) — Atomic bitwise AND
    • atomicOr(&ptr, value) — Atomic bitwise OR
    • atomicXor(&ptr, value) — Atomic bitwise XOR
    • atomicExchange(&ptr, value) — Atomic exchange
    • atomicCompareExchangeWeak(&ptr, cmp, val) — Compare and exchange
  • spirv/backend.go — SPIR-V atomic emission (~100 LOC)
    • OpAtomicIAdd, OpAtomicISub, OpAtomicAnd, OpAtomicOr, OpAtomicXor
    • OpAtomicUMin, OpAtomicUMax, OpAtomicExchange, OpAtomicCompareExch
  • ir/expression.goExprAtomicResult for atomic operation results

Workgroup Barriers

  • wgsl/lower.go — Barrier function lowering
    • workgroupBarrier() — Synchronize workgroup threads
    • storageBarrier() — Memory barrier for storage buffers
    • textureBarrier() — Memory barrier for textures
  • spirv/backend.goOpControlBarrier emission with memory semantics

Address-of and Dereference Operators

  • wgsl/lower.go& and * operator handling
    • &var — Returns pointer (no-op for storage variables)
    • *ptr — Creates ExprLoad for dereferencing

Unused Variable Warnings

  • wgsl/lower.go — Warning infrastructure (~50 LOC)
    • Warning type with message and source span
    • LowerResult struct containing module and warnings
    • LowerWithWarnings() API for accessing warnings
    • Variables prefixed with _ are intentionally ignored
    • checkUnusedVariables() called after each function

Better Error Messages

  • wgsl/errors.goSourceError type with source location
  • wgsl/errors.goFormatWithContext() for pretty error display
  • wgsl/lower.goLowerWithSource() preserves source for errors

Changed

  • spirv/spirv.go — Added SPIR-V opcodes for atomics and barriers
  • Total: 203 tests across all packages (+79 from v0.3.0)

Fixed

  • Type switch in emitAtomic now uses assignment form (gocritic fix)

0.3.0 - 2025-12-11

let type inference, array initialization, and texture sampling (~3K new LOC).

Added

Type Inference for let Bindings

  • wgsl/lower.goinferTypeFromExpression() method (~80 LOC)
    • Supports inferring type from any expression
    • let x = 1.0 → inferred f32
    • let v = vec3(1.0) → inferred vec3
    • let n = normalize(v) → inferred from function return type
  • wgsl/lower_type_inference_test.go — 6 new tests

Array Initialization Syntax

  • wgsl/lower.go — Array constructor handling (~50 LOC)
    • array(1, 2, 3) shorthand with inferred type and size
    • array<f32, 3>(...) explicit syntax
    • Element type inferred from first element
  • Tests for array shorthand and vector arrays

Texture Sampling Operations

  • wgsl/lower.go — Texture function lowering (~250 LOC)
    • textureSample(t, s, coord) — Basic sampling
    • textureSampleBias(t, s, coord, bias) — With LOD bias
    • textureSampleLevel(t, s, coord, level) — Specific mip level
    • textureSampleGrad(t, s, coord, ddx, ddy) — With derivatives
    • textureLoad(t, coord, level) — Direct texel load
    • textureStore(t, coord, value) — Write to storage texture
    • textureDimensions(t) — Get texture size
    • textureNumLevels(t) — Get mip count
    • textureNumLayers(t) — Get array layer count
  • spirv/backend.go — SPIR-V image operations (~200 LOC)
    • OpSampledImage — Combine texture and sampler
    • OpImageSampleImplicitLod — textureSample
    • OpImageSampleExplicitLod — textureSampleLevel
    • OpImageFetch — textureLoad
    • OpImageWrite — textureStore
    • OpImageQuerySize* — textureDimensions
    • OpImageQueryLevels — textureNumLevels
    • Helper methods: getSampledImageType(), emitVec4F32Type()

Changed

  • wgsl/lower.golowerLocalVar() supports optional type with inference
  • wgsl/lower.goisBuiltinConstructor() includes "array"
  • wgsl/lower.golowerBuiltinConstructor() handles array shorthand
  • naga_test.go — Enabled TestCompileWithMathFunctions (was skipped)
  • Total: 124 tests across all packages

Fixed

  • Array size now correctly uses pointer (*uint32) per IR definition
  • SPIR-V OpImageFetch uses coordinate without sampler

0.2.0 - 2025-12-11

Type inference and SPIR-V backend improvements (~2K new LOC).

Added

Type Inference System

  • ir/resolve.go — Complete type inference engine (~500 LOC)
    • Resolves types for all 25+ expression kinds
    • Handles literals, constants, composites, binary/unary ops
    • Supports nested types (vectors, matrices, arrays, structs)
    • TypeResolution struct for dual handle/inline representation
  • ir/resolve_test.go — 8 comprehensive unit tests

Type Deduplication

  • ir/registry.go — Type registry for SPIR-V compliance (~100 LOC)
    • Ensures each unique type appears exactly once
    • Normalized type keys for structural equality
    • Supports all IR type kinds
  • ir/registry_test.go — 18 unit tests

SPIR-V Backend Improvements

  • Proper type resolution instead of placeholders
  • Correct int/float/uint opcode selection:
    • IAdd/ISub/IMul vs FAdd/FSub/FMul
    • SDiv/UDiv/FDiv for signed/unsigned/float
    • IEqual/SLessThan vs FOrdEqual/FOrdLessThan
  • emitInlineType() for temporary types
  • Range-based iteration to avoid large struct copies

Testing

  • spirv/shader_test.go — 10 end-to-end shader compilation tests
  • wgsl/lower_type_inference_test.go — 3 integration tests
  • wgsl/deduplication_test.go — Type deduplication tests
  • Total: 67+ tests across all packages

Changed

  • ir/ir.go — Added TypeResolution struct and ExpressionTypes to Function
  • wgsl/lower.go — Integrated type registry and expression type tracking
  • spirv/backend.go — Uses real types from inference system (~350 lines changed)
  • ir/validate.go — Range-based iteration for performance

Fixed

  • SPIR-V binary output now has correct type IDs for all expressions
  • Comparison operators correctly return bool or vec<bool>
  • Math functions select correct int vs float GLSL.std.450 instructions

0.1.0 - 2025-12-10

First stable release. Complete WGSL to SPIR-V compilation pipeline (~10K LOC).

Added

Intermediate Representation (IR)

  • ir/expression.go — 33 expression types (~520 LOC)
    • Literals (f32, f64, i32, u32, bool)
    • Binary/Unary operators (17 binary, 3 unary)
    • Access expressions (array, struct, swizzle)
    • Math functions (60+ supported)
    • Texture operations (sample, load, query)
  • ir/statement.go — 16 statement types (~320 LOC)
    • Control flow (if, loop, switch, break, continue)
    • Memory operations (store, atomic)
    • Function calls
  • ir/validate.go — Comprehensive IR validation (~750 LOC)
    • Type validation
    • Expression validation
    • Statement validation
    • Entry point validation

AST to IR Lowering

  • wgsl/lower.go — AST → IR converter (~1050 LOC)
    • Type resolution (scalars, vectors, matrices, arrays, structs)
    • Built-in type recognition
    • Binding resolution (@builtin, @location, @group/@binding)
    • Expression lowering
    • Statement lowering

SPIR-V Backend

  • spirv/writer.go — Binary module builder (~670 LOC)
    • SPIR-V header generation
    • Instruction encoding
    • String encoding with padding
  • spirv/backend.go — IR → SPIR-V translator (~1500 LOC)
    • Type emission (all IR types)
    • Constant emission (scalars, composites)
    • Function emission
    • Expression emission (33 expression types)
    • Control flow (if, loop, break, continue)
    • 40+ built-in math functions via GLSL.std.450
    • Derivative functions (dpdx, dpdy, fwidth)
  • spirv/spirv.go — SPIR-V constants and opcodes
    • 100+ opcodes
    • 81 GLSL.std.450 extended instructions

Public API

  • naga.go — Public API (~160 LOC)
    • Compile(source) — One-function compilation
    • CompileWithOptions(source, opts) — Custom options
    • Parse(), Lower(), Validate(), GenerateSPIRV() — Individual stages

CLI Tool

  • cmd/nagac/main.go — Command-line compiler
    • -o output file
    • -debug include debug names
    • -validate enable validation
    • -version show version

Tests

  • naga_test.go — 7 integration tests
  • ir/validate_test.go — 12 validation tests
  • spirv/backend_test.go — Backend tests
  • spirv/writer_test.go — Writer tests
  • wgsl/lower_test.go — Lowering tests

Changed

  • Updated .golangci.yml with exclusions for compiler complexity
  • Expanded spirv/spirv.go with full opcode set