Acceptance Backtests (ThetaData)

January 26, 2026 · View on GitHub

This document is the canonical manual acceptance suite for LumiBot backtesting (ThetaData) and release validation.

Backtesting Definitions (Accuracy + Speed)

Accuracy (gold standard): if we can replay a period that was traded live and reproduce the broker’s realized behavior (fills + PnL) within defined tolerances (tick size, fees model). Acceptance runs are a deterministic regression firewall, not a proof of “real world accuracy” by themselves.

Accuracy validation ladder (Tier 3 is the real gold standard)

  • Tier 1 (regression): deterministic acceptance backtests + vendor/artifact parity to detect drift.
  • Tier 2 (audit): manual reviews around known hard edges (session gaps, holidays/early closes, rolls, rounding).
  • Tier 3 (gold): live replay baseline — replay an interval that was traded live and reproduce broker fills + realized PnL within tolerances.

Speed: acceptance warm-cache runs complete in bounded wall time and are queue-free (no downloader submits), proving the cache and data semantics are stable.

Resilience: acceptance runs must also prove the “end of backtest” pipeline is stable:

  • stats summary must not crash (CAGR/datetime edge cases, NaN handling, etc.),
  • tearsheet/plot generation should either succeed or fail in a controlled way (no masking simulation success with a generic “failed” run),
  • and the run must still emit actionable artifacts (trades.csv, stats.csv, logs.csv) even when optional post-processing fails.

IBKR acceptance backtests (Crypto + Futures)

This repo’s acceptance harness (tests/backtest/test_acceptance_backtests_ci.py) includes deterministic, cache-backed:

  • IBKR crypto acceptance (minute bars)
  • IBKR CME equity futures acceptance (minute/day bars; fixed contract in 2025)

Key invariant (same as ThetaData acceptance):

  • Acceptance runs must be deterministic and queue-free (warm S3 cache invariant).

Implementation note:

  • Strategy._dump_settings() records thetadata_queue_telemetry, which is a global downloader/queue counter because IBKR REST backtesting also routes through queue_request(). IBKR acceptance will therefore assert:
    • thetadata_queue_telemetry.submit_requests == 0

Current IBKR slugs (in tests/backtest/acceptance_backtests_baselines.json):

  • ibkr_crypto_acceptance_btc_usd
  • ibkr_mes_futures_acceptance

Local runbook (requires downloader + S3 env; do not paste secrets into logs):

  • Warm S3 (tripwire OFF): python3 scripts/warm_acceptance_backtests_cache.py --slug ibkr_crypto_acceptance_btc_usd --slug ibkr_mes_futures_acceptance
  • Run only IBKR acceptance locally: python3 -m pytest -q tests/backtest/test_acceptance_backtests_ci.py -k ibkr

IBKR note: historical series can legitimately omit the final 1–3 bars of a requested window (minute/hour). LumiBot treats cache coverage within this tolerance as “good enough” to avoid repeated downloader retries in deterministic runs (see lumibot/tools/ibkr_helper.py).

Futures-specific note: backtest windows can begin/end inside long us_futures closed intervals (weekends/holidays). LumiBot treats fully closed boundary gaps as cache-satisfied (no fetch attempts), keeping acceptance deterministic and queue-free.

For design details and live-broker alignment notes, see:

  • docs/IBKR_FUTURES_BACKTESTING.md

Update protocol (read this before editing)

  • Append only: never overwrite history rows; add a new row per run.
  • Every speed row must include machine specs (even if repetitive) so future comparisons are meaningful.
  • Every row must include the run_id so artifacts remain auditable in Strategy Library/logs/.
  • Release gate runs must use production-like flags (see below).

CI acceptance gate (source of truth)

  • GitHub CI runs the real Strategy Library demos via tests/backtest/test_acceptance_backtests_ci.py as part of the normal tests/backtest/ pytest run (these are not a special workflow/job).
  • CI assertions are strict and driven by tests/backtest/acceptance_backtests_baselines.json (generated from Strategy Library/logs/ via scripts/generate_acceptance_backtests_baselines.py).
  • When updating expected outputs: append rows here and update the baseline JSON from the chosen baseline run_ids.

Current status (2026-01-06) — why acceptance is failing right now

As of 2026-01-06, the acceptance suite is not currently green when run with the tripwire enabled, even locally:

  • pytest -q tests/backtest/test_acceptance_backtests_ci.py is failing for multiple strategies because the subprocess attempts to use the Data Downloader and is hard-killed by the tripwire (exit=86).

This is not a “mysterious CI-only flake”. It is the expected consequence of two deliberate design choices:

  1. The acceptance harness forces a fresh disk cache per run via:
    • LUMIBOT_CACHE_FOLDER=<run_dir>/cache This prevents a developer’s warm local cache from hiding missing S3 objects.
  2. The acceptance harness enforces a warm S3 invariant:
    • canonical windows are expected to already exist in S3 v44,
    • so any downloader/queue usage is treated as a regression and fails the test.

Recent correctness work (daily-cadence option MTM fallback to intraday NBBO snapshots) increased the set of required quote snapshot/history objects. S3 v44 must be re-warmed for those objects before acceptance will be queue-free again.

What to do next (operator runbook)

  1. Warm the missing objects in S3 v44 outside CI (tripwire OFF):
    • use scripts/warm_acceptance_backtests_cache.py (runs the same acceptance scripts but allows downloader usage).
  2. Re-run acceptance with tripwire ON and confirm:
    • no downloader tripwire triggers,
    • thetadata_queue_telemetry.submit_requests == 0 in each *_settings.json.
  3. If headline metrics drift after data becomes complete:
    • rebaseline via scripts/generate_acceptance_backtests_baselines.py
    • update this doc’s #### Expected Results blocks in lockstep with the baseline JSON.

For full details, see:

  • docs/handoffs/2026-01-06_ACCEPTANCE_BACKTESTS_HANDOFF.md
  • docs/investigations/2026-01-06_THETADATA_OPTION_EOD_GAPS_DAILY_MTM.md

Window semantics (avoid false “drift”)

  • LumiBot treats BACKTESTING_END as exclusive.
  • As a result, *_settings.json records backtesting_end as (BACKTESTING_END - 1 day) 23:59:00 (local market TZ).

Guardrails

  • Do not modify demo strategy files under Strategy Library/Demos/. Fix issues in LumiBot (or the data-downloader if proven root cause).
  • Use a downloader endpoint appropriate for your environment (do not hard-code private hosts or IPs into the repo):
    • DATADOWNLOADER_BASE_URL must be set (local: http://localhost:8080, remote: https://<your-downloader-host>:8080)
    • DATADOWNLOADER_API_KEY must be set (value lives in env/secrets; do not paste into docs).
  • Wrap long runs with /Users/robertgrzesik/bin/safe-timeout ….

Release time gate (required)

Each acceptance run must finish within 900s (15 minutes) with production-like flags:

  • BACKTESTING_QUIET_LOGS=false
  • BACKTESTING_SHOW_PROGRESS_BAR=true
  • SHOW_PLOT=True, SHOW_INDICATORS=True, SHOW_TEARSHEET=True

Notes:

  • For debugging only, you may temporarily disable artifact generation to isolate compute vs plotting.
  • The release gate is always the production-like run above.

Run from Strategy Library/ so artifacts land in Strategy Library/logs/:

cd "/Users/robertgrzesik/Documents/Development/Strategy Library"
/Users/robertgrzesik/bin/safe-timeout 900s env \
  PYTHONPATH="/Users/robertgrzesik/Documents/Development/lumivest_bot_server/strategies/lumibot" \
  IS_BACKTESTING=True BACKTESTING_DATA_SOURCE=thetadata \
  DATADOWNLOADER_BASE_URL="http://localhost:8080" \
  SHOW_PLOT=True SHOW_INDICATORS=True SHOW_TEARSHEET=True \
  BACKTESTING_QUIET_LOGS=false BACKTESTING_SHOW_PROGRESS_BAR=true \
  BACKTESTING_START=YYYY-MM-DD BACKTESTING_END=YYYY-MM-DD \
  python3 "Demos/<strategy>.py"

Machine specs (required on every speed row)

Example format (repeat this on every speed row):

  • macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8

Canonical suite (currently 7 cases)

This suite will grow over time (more strategies, data sources, and/or windows). When adding a new case:

  • add a new section below (including #### Expected Results)
  • add/update the matching entry in tests/backtest/acceptance_backtests_baselines.json
  • keep docs/ACCEPTANCE_BACKTESTS.md and the baseline JSON referencing the same run_id

Each strategy section includes:

  • What to run (file + windows)
  • What to validate (sanity checks)
  • Anchor + history rows (metrics + wall time + machine specs)

1) AAPL Deep Dip Calls (GOOG; file name says AAPL)

  • File: Demos/AAPL Deep Dip Calls (Copy 4).py
  • CI window (BACKTESTING_START/END): 2020-01-01 → 2025-12-01
  • Validate:
    • trades occur in multiple “dip eras” (2020/2022/2025)
    • no obvious split-cliff behavior (GOOG mid-2022)
    • artifacts are produced (*_trades.csv/html, *_stats.csv, *_tearsheet.html, *_settings.json)

Expected Results (ThetaData / S3 v44)

  • Correct Total Return = 846.00%
  • Correct CAGR = 48.18%
  • Correct Max DD = -34.30%
  • Observed backtest_time_seconds (macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8) = 34.3s
  • CI cap (seconds) <= 300
  • Baseline run_id = AAPLDeepDipCalls_2026-01-05_09-33_aLJW35
run_idlumibotwindowwall_time_stotal_returncagrmax_ddflagsmachine
AAPLDeepDipCalls_2025-12-25_19-08_WHRsPm(unknown)2020-01-01 → 2025-11-30(n/a)865%48.72%-33.08%(unknown)(unknown)
AAPLDeepDipCalls_2026-01-02_10-25_3KsjXy4.4.212020-01-01 → 2025-11-30237.5870%48.86%-34.09%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
AAPLDeepDipCalls_2026-01-04_06-02_3HO2Ds4.4.242020-01-01 → 2025-11-3026.9862%48.63%-34.09%historical (superseded; pre daily-bar end-row fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
AAPLDeepDipCalls_2026-01-04_11-14_lIPHBU4.4.242020-01-01 → 2025-11-3077.8853%48.36%-34.3%historical (superseded; baseline updated under v44 cache semantics)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
AAPLDeepDipCalls_2026-01-04_19-52_w1yl8v4.4.242020-01-01 → 2025-11-3030.6863%48.65%-34.3%historical (superseded; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
AAPLDeepDipCalls_2026-01-05_09-33_aLJW354.4.272020-01-01 → 2025-11-3034.3846%48.18%-34.30%prod-like (baseline; v44; queue-free; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8

2) Alpha Picks LEAPS (Call Debit Spread)

  • File: Demos/Leaps Buy Hold (Alpha Picks).py
  • CI window (BACKTESTING_START/END): 2025-10-01 → 2025-10-15 (must trade UBER, CLS, MFC)
  • Historical / optional longer window (not in CI): 2025-01-01 → 2025-12-01
  • Validate:
    • short window trades include both legs for UBER, CLS, and MFC
    • full-year run (when used manually) produces artifacts (symbols may vary; log skip reasons)

Expected Results (ThetaData / S3 v44)

  • Correct Total Return = 0.00%
  • Correct CAGR = 8.64%
  • Correct Max DD = -1.43%
  • Observed backtest_time_seconds (macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8) = 34.6s
  • CI cap (seconds) <= 120
  • Baseline run_id = LeapsCallDebitSpread_2026-01-09_13-04_CVbj44
run_idlumibotwindowwall_time_stotal_returncagrmax_ddflagsmachine
LeapsCallDebitSpread_2025-12-25_19-14_lLFnSk(unknown)2025-10-01 → 2025-10-15(n/a)1%17.87%-1.42%(unknown)(unknown)
LeapsCallDebitSpread_2026-01-02_10-07_OZi6We4.4.212025-10-01 → 2025-10-1444.50%14.46%-1.42%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
LeapsCallDebitSpread_2026-01-02_10-48_4UtvLT4.4.212025-01-01 → 2025-11-30285.5-3%-3.03%-19.33%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
LeapsCallDebitSpread_2026-01-04_05-13_ZpmFin4.4.242025-10-01 → 2025-10-1410.82%58.04%-1.14%historical (superseded; pre daily-bar end-row fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
LeapsCallDebitSpread_2026-01-04_11-20_vXE88y4.4.242025-10-01 → 2025-10-145.40%11.81%-1.16%historical (superseded; baseline updated under v44 cache semantics)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
LeapsCallDebitSpread_2026-01-04_22-07_IhWXKY4.4.252025-10-01 → 2025-10-1433.31%18.0%-1.43%historical (superseded; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
LeapsCallDebitSpread_2026-01-09_13-04_CVbj444.4.302025-10-01 → 2025-10-1428.60%9.95%-1.16%prod-like (baseline; v44; queue-free)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
LeapsCallDebitSpread_2026-01-04_05-53_OtxpYi4.4.242025-01-01 → 2025-11-3010.7-74%-77.61%-84.04%historical (superseded; pre daily-bar end-row fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
LeapsCallDebitSpread_2026-01-04_11-22_rnMuYq4.4.242025-01-01 → 2025-11-307.6-4%-4.34%-18.39%prod-like (baseline)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8

3) TQQQ SMA200 (ThetaData vs Yahoo sanity)

  • File: Demos/TQQQ 200-Day MA.py
  • CI window (BACKTESTING_START/END): 2013-01-01 → 2025-12-01
  • Validate:
    • ThetaData result is stable and deterministic for this window
    • Yahoo is used only as an occasional manual parity sanity check (not run in CI)

Expected Results (ThetaData / S3 v44)

  • Correct Total Return = 8,585.00%
  • Correct CAGR = 42.16%
  • Correct Max DD = -48.40%
  • Observed backtest_time_seconds (macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8) = 35.4s
  • CI cap (seconds) <= 180
  • Baseline run_id = TqqqSma200Strategy_2026-01-05_09-35_IibziX

Yahoo one-time baseline (manual note; do not run in CI):

  • run_id = TqqqSma200Strategy_2026-01-04_04-40_2sdaIJ
  • Total Return = 8,272.00%; CAGR = 40.94%; Max DD = -48.82%; backtest_time_seconds = 11.3s
run_idlumibotdata_sourcewindowwall_time_stotal_returncagrmax_ddflagsmachine
TqqqSma200Strategy_2025-12-25_19-22_UoZ2yn(unknown)(unknown)2013-01-01 → 2025-11-30(n/a)8,272%40.94%-48.82%(unknown)(unknown)
TqqqSma200Strategy_2025-12-25_19-20_cQkd1T(unknown)(unknown)2013-01-01 → 2025-11-30(n/a)8,585%42.17%-48.40%(unknown)(unknown)
TqqqSma200Strategy_2026-01-02_10-24_Uus6vb4.4.21thetadata2013-01-01 → 2025-11-3033.88,585%42.17%-48.40%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
TqqqSma200Strategy_2026-01-02_10-25_fOI4Ek4.4.21yahoo2013-01-01 → 2025-11-308.28,272%40.94%-48.82%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
TqqqSma200Strategy_2026-01-04_04-39_xX9si44.4.24thetadata2013-01-01 → 2025-11-3012.18,774%42.4%-48.4%historical (superseded; pre daily-bar end-row fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
TqqqSma200Strategy_2026-01-04_11-10_Wa65DX4.4.24thetadata2013-01-01 → 2025-11-3016.38,585%42.16%-48.4%historical (superseded; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
TqqqSma200Strategy_2026-01-04_04-40_2sdaIJ4.4.24yahoo2013-01-01 → 2025-11-3011.38,272%40.94%-48.82%prod-like (baseline)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
TqqqSma200Strategy_2026-01-05_09-35_IibziX4.4.27thetadata2013-01-01 → 2025-11-3035.48,585%42.16%-48.40%prod-like (baseline; v44; queue-free; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8

4) Backdoor Butterfly 0DTE (regular fills; index + index options)

  • File: Demos/Backdoor Butterfly 0 DTE (Copy).py
  • Validate:
    • no crashes due to index placeholder tails / missing history
    • artifacts produced

CI uses the full-year window; the shorter window is retained only as historical speed context:

  • CI window (BACKTESTING_START/END): 2025-01-01 → 2025-12-01
  • Historical / speed baseline (not in CI): 2025-01-01 → 2025-11-30

Expected Results (ThetaData / S3 v44)

  • Correct Total Return = -19.00%
  • Correct CAGR = -21.02%
  • Correct Max DD = -26.30%
  • Observed backtest_time_seconds (macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8) = 292.0s
  • CI cap (seconds) <= 900
  • Baseline run_id = BackdoorButterfly0DTE_2026-01-05_09-36_7AP0H8
run_idlumibotwindowwall_time_stotal_returncagrmax_ddflagsmachine
BackdoorButterfly0DTE_2025-12-25_18-29_KAD4Qk(unknown)2025-01-01 → 2025-11-30(n/a)-26%-28.55%-32.51%(unknown)(unknown)
BackdoorButterfly0DTE_2025-12-31_15-43_TWzKau4.4.202025-01-01 → 2025-11-3079.8-22%-24.00%-30.13%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-02_10-29_HPNuUM4.4.212025-01-01 → 2025-11-30267.8-19%-20.79%-25.94%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-02_18-52_XdYcWQ4.4.212025-01-01 → 2025-11-29121.6-21%-23.12%-26.42%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-04_06-26_S1FSC24.4.242025-01-01 → 2025-11-30120.5-20%-22.1%-25.48%historical (superseded; pre daily-bar end-row fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-04_11-40_1VPPZ94.4.242025-01-01 → 2025-11-30119.4-21%-23.15%-26.45%historical (superseded; multileg parent NEW_ORDER dispatch nondeterminism)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-04_06-24_KaizMH4.4.242025-01-01 → 2025-11-29116.4-21%-23.37%-26.62%historical (superseded; pre daily-bar end-row fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-04_11-27_KD9Qi04.4.242025-01-01 → 2025-11-29118.0-21%-23.11%-26.41%historical (superseded; minor rounding drift)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-04_11-33_sBKYi24.4.242025-01-01 → 2025-11-29118.1-21%-23.12%-26.42%historical (pre deterministic multileg parent dispatch)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-04_17-14_Dg5NJR4.4.242025-01-01 → 2025-11-30235.2-21%-23.12%-26.42%historical (superseded; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTE_2026-01-05_09-36_7AP0H84.4.272025-01-01 → 2025-11-30292.0-19%-21.02%-26.30%expected baseline (v44; queue-free; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8

5) MELI Deep Drawdown Calls

  • File: Demos/Meli Deep Drawdown Calls.py
  • CI window (BACKTESTING_START/END): 2013-01-01 → 2025-12-18
  • Validate:
    • entry trades occur (drawdown-triggered buys)
    • no “sawtooth” PV caused by missing option marks (forward-fill behavior remains stable)

This strategy was previously under investigation for baseline mismatch; CI uses the row marked expected baseline.

Expected Results (ThetaData / S3 v44)

  • Correct Total Return = 14.00%
  • Correct CAGR = 1.11%
  • Correct Max DD = -99.30%
  • Observed backtest_time_seconds (macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8) = 122.6s
  • CI cap (seconds) <= 300
  • Baseline run_id = MeliDeepDrawdownCalls_2026-01-09_13-04_SZkY2g
run_idlumibotwindowwall_time_stotal_returncagrmax_ddstatusmachine
MeliDeepDrawdownCalls_2025-12-25_20-38_33bGtY(unknown)2013-01-01 → 2025-12-17(n/a)131%7.26%-97.78%expected (historical anchor)(unknown)
MeliDeepDrawdownCalls_2026-01-02_10-09_7yisFp4.4.212013-01-01 → 2025-12-17856.3-91%-18.22%-99.73%under investigationmacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
MeliDeepDrawdownCalls_2026-01-02_19-24_kZELl54.4.212013-01-01 → 2025-12-17350.414%1.08%-98.26%under investigation (daily snapshot NBBO override)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
MeliDeepDrawdownCalls_2026-01-04_09-39_hyg1f14.4.242013-01-01 → 2025-12-1718.182%5.12%-98.2%historical (superseded; pre daily-bar end-row fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
MeliDeepDrawdownCalls_2026-01-04_11-05_y7Ap6O4.4.242013-01-01 → 2025-12-1741.1-89%-16.83%-98.96%historical (superseded; stale snapshot placeholder refetch caused missing option marks)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
MeliDeepDrawdownCalls_2026-01-05_02-18_dKzthV4.4.252013-01-01 → 2025-12-17111.3104%6.16%-98.17%historical (superseded; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
MeliDeepDrawdownCalls_2026-01-09_13-04_SZkY2g4.4.302013-01-01 → 2025-12-1798.414%1.11%-99.30%expected baseline (v44; queue-free)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8

See: docs/investigations/ACCURACY_AUDIT_2026-01-02.md for the divergence notes and first-diff audit plan.

6) Backdoor Butterfly with SmartLimit

  • File: Demos/Backdoor Butterfly 0 DTE (Copy) - with SMART LIMITS.py
  • CI window (BACKTESTING_START/END): 2025-01-01 → 2025-12-01
  • Validate:
    • completes without stalling
    • artifacts produced
    • SmartLimit fills behave like “mid + slippage” (net multi-leg), not bid/ask worst-case

Expected Results (ThetaData / S3 v44)

  • Correct Total Return = -3.00%
  • Correct CAGR = -3.82%
  • Correct Max DD = -13.38%
  • Observed backtest_time_seconds (macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8) = 290.8s
  • CI cap (seconds) <= 900
  • Baseline run_id = BackdoorButterfly0DTESmartLimit_2026-01-05_09-44_qLKdxw
run_idlumibotwindowwall_time_stotal_returncagrmax_ddflagsmachine
BackdoorButterfly0DTESmartLimit_2026-01-02_10-34_UTFoHq4.4.212025-01-01 → 2025-11-30283.0-3%-2.96%-13.58%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTESmartLimit_2026-01-02_19-49_QXkWuB4.4.212025-01-01 → 2025-11-29107.1-6%-6.2%-13.39%prod-likemacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTESmartLimit_2026-01-04_06-29_NduXK04.4.242025-01-01 → 2025-11-30120.4-6%-6.42%-14.88%historical (superseded; baseline updated under v44 cache semantics)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTESmartLimit_2026-01-04_20-26_qtvxlf4.4.242025-01-01 → 2025-11-30273.1-6%-6.43%-13.41%historical (superseded; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
BackdoorButterfly0DTESmartLimit_2026-01-05_09-44_qLKdxw4.4.272025-01-01 → 2025-11-30290.8-3%-3.82%-13.38%prod-like (baseline; v44; queue-free; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8

7) SPX Short Straddle Intraday (production stall repro)

  • File: Demos/SPX Short Straddle Intraday (Copy).py
  • Validate:
    • no “silent hang” (logs continue via heartbeats while waiting)
    • run continues progressing; no permanent download_status.active=true without logs

CI uses the stall repro / prod parity window; the shorter window is retained only as historical speed context:

  • CI window (BACKTESTING_START/END): 2025-01-06 → 2025-12-26
  • Historical / speed baseline (not in CI): 2025-01-01 → 2025-12-01

Expected Results (ThetaData / S3 v44)

  • Correct Total Return = -11.00%
  • Correct CAGR = -11.72%
  • Correct Max DD = -28.72%
  • Observed backtest_time_seconds (macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8) = 193.2s
  • CI cap (seconds) <= 900
  • Baseline run_id = SPXShortStraddle_2026-01-05_09-49_35TJJl
run_idlumibotwindowwall_time_stotal_returncagrmax_ddnotesmachine
SPXShortStraddle_2025-12-31_17-16_Ff79Hy4.4.202025-01-01 → 2025-11-30104.8-17%-18.99%-28.34%speed baseline (historical)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
SPXShortStraddle_2026-01-02_10-39_XtAwjW4.4.212025-01-06 → 2025-12-25516.8-17%-17.81%-33.51%stall repro windowmacOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
SPXShortStraddle_2026-01-02_18-51_1JvQro4.4.212025-01-06 → 2025-12-2463.8-17%-17.5%-33.51%stall repro window (perf fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
SPXShortStraddle_2026-01-04_06-37_sHgfVQ4.4.242025-01-01 → 2025-11-3070.7-21%-22.96%-30.87%speed baseline (baseline)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
SPXShortStraddle_2026-01-04_06-41_B1jF984.4.242025-01-06 → 2025-12-2576.1-17%-17.5%-33.51%historical (superseded; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8
SPXShortStraddle_2026-01-05_09-49_35TJJl4.4.272025-01-06 → 2025-12-25193.2-11%-11.72%-28.72%stall repro window (baseline; v44; queue-free; daily fill alignment fix)macOS=26.1; CPU=Apple M3 Max; RAM=48GB; Python=3.11.8

Optional: Profiling artifact (opt-in)

To capture a profiler output for production-vs-local speed parity investigations, add:

  • BACKTESTING_PROFILE=yappi

When enabled, LumiBot writes a *_profile_yappi.csv artifact next to the other backtest outputs.