Block-Level Dispatcher Optimization

April 10, 2026 · View on GitHub

Date: 2026-02-04 Status: ✅ IMPLEMENTED AND TESTED Test Pass Rate: 100% (2006/2006 unit tests)

Problem Statement

The original control flow implementation emitted a complete dispatcher at each call site (~150 bytes per call). For code with multiple sequential calls in the same block:

for (1..3) {
    A();  # 150 bytes dispatcher
    B();  # 150 bytes dispatcher (identical!)
    C();  # 150 bytes dispatcher (identical!)
    D();  # 150 bytes dispatcher (identical!)
}

Total: 4 × 150 = 600 bytes of mostly redundant code.

Solution: Block-Level Shared Dispatchers

Key Insight: All calls within the same block with the same visible loops can share ONE dispatcher!

Implementation Strategy

Loop State Signature: Compute unique signature for visible loops using label names + identity hash codes
Dispatcher Reuse: Map signatures to dispatcher labels in JavaClassInfo.blockDispatcherLabels
First Use: Create and emit dispatcher on first call with a signature
Subsequent Calls: Reuse existing dispatcher by jumping to its label

Code Structure

Each call site (~20 bytes):

ASTORE controlFlowTempSlot     // Store result
ALOAD controlFlowTempSlot
INVOKEVIRTUAL isNonLocalGoto() // Check if marked
IFEQ notControlFlow
GOTO blockDispatcher            // Jump to shared dispatcher

notControlFlow:
ALOAD controlFlowTempSlot       // Not marked, continue

Note: In addition to the above, each call site also emits an inline tail-call trampoline (~30 instructions) that handles goto &NAME without jumping to the block dispatcher. The actual per-call-site bytecode is ~100-150 bytes, not ~20.

Block dispatcher (emitted once, ~150 bytes):

blockDispatcher:
  Get control flow type ordinal
  Check if LAST/NEXT/REDO (0/1/2)
  IF_ICMPGT propagateToCaller    // ordinals > 2 handled below
  // Higher ordinals (GOTO=3, TAILCALL=4, RETURN=5):
  // - GOTO and TAILCALL: propagate to caller
  // - RETURN (ordinal 5): unwrap return value for non-map/grep blocks,
  //   then jump to returnLabel; otherwise propagate
  Loop through visible loop labels:
    Match label name
    Dispatch by type to appropriate label
  If no match, propagate to caller

Skip over dispatcher:

GOTO skipDispatcher             // Skip dispatcher in normal flow
blockDispatcher:
  [dispatcher code]
skipDispatcher:
  [normal execution continues]

Results

Bytecode Savings

For N calls sharing the same loop state:

Old: 150N bytes
New: 20N + 150 + 3 bytes
Savings: 130N - 153 bytes

Note: These calculations use a simplified ~20 bytes per call site. The actual per-call-site overhead is ~100-150 bytes due to the inline tailcall trampoline. The block dispatcher sharing still provides significant savings, but the absolute numbers differ from this simplified model.

Examples:

Calls	Old (bytes)	New (bytes)	Savings	Percentage
1	150	173	-23	-15% ⚠️
2	300	193	107	36% ✅
4	600	233	367	61% ✅
10	1500	353	1147	76% ✅

Real-World Measurements

Test case: 4 sequential calls in loop (for { A(); B(); C(); D(); })

Master: 2232 bytecode lines
Block dispatcher: 2139 bytecode lines
Savings: 93 lines (4.2%)
CHECKCAST operations: 23 → 17 (26% reduction)

Complex nested loops: No regression (1374 lines maintained)

Implementation Files

Modified Files

JavaClassInfo.java
- Added blockDispatcherLabels map to track dispatcher reuse
- Added getLoopStateSignature() method to compute unique signatures
- Imports: Uses wildcard import java.util.*
EmitSubroutine.java
- Modified call-site emission to use block-level dispatchers
- Added emitBlockDispatcher() helper method
- Simplified call-site code to ~20 bytes (check + GOTO)
Dereference.java (backend/jvm/Dereference.java)
- Contains a full copy of the block-dispatcher pattern (lines 982-1109)
- Includes getLoopStateSignature(), blockDispatcherLabels lookup, the tailcall trampoline, and calls EmitSubroutine.emitBlockDispatcher()
- Significant second emission point alongside EmitSubroutine.java
control-flow.md
- Documented block-level dispatcher approach
- Updated performance metrics
- Explained why method-level centralization doesn't work

Technical Details

Loop State Signature

Computed by concatenating loop label information:

"UNLABELED@12345|OUTER@67890|INNER@24680"

Uses System.identityHashCode() to uniquely identify loop objects
Same signature = same visible loops = can share dispatcher
Different signatures = different loop contexts = need separate dispatchers

Why This Works

Scope Safety: Dispatcher stays within loop scope (no frame computation issues)
Visibility: Only checks loops visible at that point
Reuse: Multiple calls share one dispatcher automatically
Backward Jumps: Work correctly because we're still in scope