Master Repository Audit — codex-multi-auth

April 25, 2026 · View on GitHub

Historical snapshot: this audit was captured against codex-multi-auth@1.2.7 before the 2.0 runtime-rotation architecture. Preserve the evidence and findings as historical audit material. For current architecture, use docs/architecture.md and docs/development/ARCHITECTURE.md.

Audit Date: 2026-04-17 HEAD: 1f6da97d06dcc8c268b304e6e45b6baa9a386679 Branch: main Package: codex-multi-auth@1.2.7 Node: (captured in evidence/context.txt) Audit methodology: Sisyphus multi-wave plan executed under Atlas; evidence under docs/audits/evidence/ Composition note: Findings composed from dimension deep-dives (dim-C through dim-P). Several dimensions composed by Atlas from captured sub-agent analysis when agents exceeded step budgets before writing deliverables.

Severity Rubric

  • CRITICAL: token/auth corruption, data loss, unsafe credential handling, or core trust breakage
  • HIGH: likely real-world operational pain, hard-to-debug failures, serious maintainability or resilience risk, bypassable security
  • MEDIUM: meaningful architecture/testing/DX weakness, degraded UX
  • LOW: cleanup, polish, consistency

Dimensions Audited (Coverage Matrix)

DimAreaPrimary SectionEvidence File
Dimension AProduct / system understanding§1 Executive, §2 System Mapinventory.txt, context.txt
Dimension BArchitecture§2 System Map, §8 Refactors, §16 Modulesall dim-*.md
Dimension CAuth / OAuth / token lifecycle§5 HIGH (H4, H5), §10 Securitydim-C-auth.md
Dimension DMulti-account / routing / failover§5 HIGH (H2, H3, H10), §6 MEDIUMdim-D-routing.md
Dimension EStorage / filesystem / state§5 HIGH (H1), §6 MEDIUM (M01-M05)dim-E-storage.md
Dimension FConfig / settings / precedence§5 HIGH (H6), §6 MEDIUM (M21-M23)dim-F-config.md
Dimension GCLI / UX§6 MEDIUM (M24-M28), §12dim-G-cli.md
Dimension HRequest / SSE / resilience§5 HIGH (H9), §6 MEDIUM (M16-M19)dim-H-request.md
Dimension IType safety / validation§6 MEDIUM (M20), §10dim-I-types.md
Dimension JError handling§6 MEDIUM (M29), §10dim-JN-errors-health.md
Dimension KTests§5 HIGH (H8), §11dim-K-tests.md
Dimension LRelease / CI / OSS§5 HIGH (H7, H8), §12dim-LM-release-docs.md
Dimension MDocs accuracy§5 HIGH (H5, H8), §12dim-LM-release-docs.md + docs-claims.txt
Dimension NCode health / cleanup§6 MEDIUM (M30, M31), §7dim-JN-errors-health.md
Dimension OFeatures§9feature-recs inline
Dimension PPerf (lightweight)§6 MEDIUM (M34, M35), §7 (L11)dim-P-perf.md

Table of Contents

  1. Executive Summary
  2. System Map
  3. What Is Already Strong
  4. Critical Issues
  5. High-Priority Improvements
  6. Medium Improvements
  7. Low-Priority Cleanups
  8. Refactoring Plan
  9. Feature Recommendations
  10. Security / Trust Review
  11. Testing Gap Analysis
  12. CLI / DX / Docs Review
  13. Quick Wins
  14. Phased Implementation Roadmap
  15. Top 20 Recommended Actions
  16. Module-by-Module Notes
  17. Final Verdict

1. Executive Summary

Maturity: 4/5. codex-multi-auth is a structurally healthy, security-conscious CLI-first OAuth manager. Strict TypeScript, centralized Zod schemas for high-risk payloads, no-explicit-any enforced (verified: 0 occurrences), @ts-ignore absent, clean typecheck, clean lint, clean audit:ci, clean vendor:verify, recent security-dep bump cadence, full OSS governance stack (SECURITY.md, CODE_OF_CONDUCT.md, CONTRIBUTING.md, LICENSE), 4 CI workflows including CodeQL, and 3418 tests across 225 files with verified hermetic execution under redirected HOME + CODEX_MULTI_AUTH_DIR.

Biggest strengths (preserve):

  • Hermeticity: tests do not leak to real ~/.codex/multi-auth/ when env-redirected — zero delta verified (see K-05)
  • Strict TypeScript discipline: 0 as any, 0 @ts-ignore across lib/ + index.ts (I-01, I-02)
  • Refresh-queue race prevention: token-keyed dedupe + rotation handoff + rollback on persist fail (C-10)
  • Atomic writes for primary account storage + flagged storage + unified settings: temp+rename pattern used (E-03 positive)
  • Request-loop termination safety: 4 independent guards prevent infinite retry (H-09)
  • OSS readiness: governance files complete; CodeQL workflow + dep scanner workflow in CI (LM-03, LM-08)

Biggest risks:

  1. resolvePath() path-guard regression on HEAD — test/paths.test.ts:846 fails; lookalike-prefix paths outside the home directory are not rejected; gates import/export, so a guard failure can redirect reads/writes outside approved roots (E-01, K-02)
  2. Hybrid account selector can return blocked/unavailable accounts — selectHybridAccount() falls back to LRU even when available.length === 0; fetch loop trusts it without re-validating (D-01)
  3. Plugin config precedence bug — loadPluginConfig does not prefer primary CONFIG_PATH over CODEX_HOME legacy path; test/plugin-config.test.ts:417 fails on HEAD (F-01, K-04)
  4. Live OAuth URL leaks to stdout/clipboard — browser-fallback and manual login print raw URL containing live state and code_challenge (C-AUTH-05)
  5. Docs-to-code drift: AGENTS.md claims v0.1.x / "87 files, 2071 tests" — reality is v1.2.7 / 225 files / 3418 tests; README/docs claim canonical redirect 127.0.0.1:1455 but code uses localhost:1455 (LM-02, LM-12, C-AUTH-03)

Top 5 priorities for the next cycle:

  1. Fix resolvePath() lookalike bypass + plugin-config precedence + codex-manager-cli auth list message drift (the 3 failing tests are all real regressions)
  2. Re-validate account availability after selectHybridAccount() OR change its contract to return null when no account is available
  3. Redact OAuth URL in user-facing output (show host/port only, keep full URL for clipboard/browser handoff)
  4. Fix pack:check bloat and truth-up AGENTS.md + docs redirect host
  5. Split settings-hub.ts (2100 LOC) into sub-concern files (theme, accounts, sync, diagnostics, experimental)

2. System Map

Architecture (inferred)

User (CLI/terminal)
  │
  â–¼
scripts/codex.js  (bin wrapper — lazy-load auth runtime)
  │
  â–¼
lib/codex-manager.ts  (command dispatcher)
  ├── codex auth login|status|check|list|switch|forecast|verify-flagged|fix|doctor|report
  │
  ├─▶ lib/auth/  ────────────────┐
  │   auth.ts (PKCE, JWT, token)  │
  │   server.ts (callback :1455)  │
  │   browser.ts                  ▼
  │                        OAuth 2.0 Authorization Code + PKCE
  │                        https://auth.openai.com/...
  │
  ├─▶ lib/accounts.ts + lib/accounts/**  ──▶  lib/rotation.ts  ──▶  lib/health.ts
  │                                            (hybrid/round-robin/sticky)
  │                                                  │
  │                                                  ▼
  ├─▶ lib/request/** (index.ts 7-step pipeline)  ──▶  lib/circuit-breaker.ts
  │   URL rewrite → init → transform → continuation → account iter/refresh/headers
  │   → fetch+timeout+retry/rotation → success+SSE+failover
  │                                                  │
  │                                                  ▼
  │                                    OpenAI Codex backend (ChatGPT routing)
  │
  ├─▶ lib/storage.ts + lib/storage/**  ──▶  atomic writes
  │   V1 ↔ V3 migrations, worktree resolution, EBUSY retry
  │   ~/.codex/multi-auth/ or CODEX_MULTI_AUTH_DIR
  │
  ├─▶ lib/codex-manager/settings-hub.ts  (2100 LOC dashboard TUI)
  │
  └─▶ lib/ui/**  (ansi, auth-menu, theme, select, copy)

Storage/config flow

  • Root: ~/.codex/multi-auth/ (override: CODEX_MULTI_AUTH_DIR)
  • Settings: settings.json (EBUSY/EPERM retry, max 4 exponential)
  • Accounts (global): openai-codex-accounts.json
  • Accounts (project-scoped): projects/<project-key>/openai-codex-accounts.json
  • Flagged: openai-codex-flagged-accounts.json
  • Quota cache: quota-cache.json
  • Runtime observability: runtime-observability.json
  • Logs: logs/codex-plugin/
  • Config path: CODEX_MULTI_AUTH_CONFIG_PATH primary, CODEX_HOME legacy (precedence currently BUGGY — F-01)

Trust boundaries

  1. User ↔ CLI (low — stdin + fs)
  2. CLI ↔ OS keychain / file system (MEDIUM — tokens persisted as plaintext JSON, mode 0600; larger-than-minimum secret footprint — C-AUTH-08)
  3. CLI ↔ Browser / Localhost callback (HIGH — :1455 bound on any-interface; redirect URI mismatch risk C-AUTH-03)
  4. CLI ↔ auth.openai.com (HIGH — OAuth endpoint; hardcoded single host, no allowlist abstraction C-AUTH-12)
  5. CLI ↔ ChatGPT backend (HIGH — headers, rate-limits, model routing; failover across multiple accounts)
  6. Multi-account isolation (MEDIUM — project-scoped storage can silently collapse to global when Codex CLI sync enabled D-06)

Highest-risk boundaries

  • lib/storage/paths.ts::resolvePath() — path-guard for import/export (HIGH — regression at HEAD, E-01/K-02)
  • lib/auth/auth.ts::REDIRECT_URI ↔ lib/auth/server.ts bind (HIGH — host mismatch C-AUTH-03)
  • lib/accounts.ts::selectHybridAccount() ↔ fetch loop in index.ts (HIGH — unavailable-account selection D-01)
  • Token file at rest — plaintext JSON with refresh tokens + cached access tokens (MEDIUM — C-AUTH-08)

3. What Is Already Strong

These decisions work well and should be preserved through any refactor:

StrengthEvidencePreserve by
Hermetic test design — HOME / CODEX_MULTI_AUTH_DIR redirection produces zero drift under full suiteK-05Keep env-redirect pattern; add regression test asserting hermeticity when new tests land
Strict TypeScript with zero escape hatches — 0 as any, 0 @ts-ignore, 0 @ts-expect-error, 0 @ts-nocheck in lib/ + index.ts; strict: true in tsconfigI-01, I-02, I-05Keep ESLint flat-config no-explicit-any rule; add CI gate if not present
Refresh-queue race prevention with token-keyed dedupe + rotation handoff + rollback on persist failC-10Add a targeted regression test around rotation-then-persist-failure rollback if coverage does not already lock it
Atomic writes for primary account storage, flagged storage, unified settings, exportE-03 (positive)Extend same pattern to recovery/session storage (currently violates — E-03)
4-gate request-loop termination (attempted.size, outbound budget, MAX_SHORT_RETRY_ATTEMPTS=3, MAX_STREAM_FAILOVERS=1)H-09Document the gates inline; add invariant test
Defensive storage corruption recovery — checksum-protected WAL + rotating backupsC-AUTH-09Add observability signal when WAL/backup recovery is used so operators notice silent recovery
Structured CLI doctrine — doctor --fix dry-run safe; Q=cancel hotkey consistent; theme live-preview with baseline restore on cancelG-07, G-05Apply same --dry-run discipline to new repair commands
OSS governance — SECURITY.md, CODE_OF_CONDUCT, CONTRIBUTING, LICENSE, issue/PR templates, CodeQL + plugin scanner + dep scanner workflowsLM-03, LM-08Keep the full governance stack
Clean supply chain — audit:ci (prod + dev allowlist) + vendor:verify + bundleDependencies + npm overridesLM-04, LM-05Keep; add per-release vendor manifest hash pinning if not present
Active security maintenance — recent hono 4.12.14, vite ^7.3.2 bumps at HEADLM-09Keep dependabot cadence; keep Dependabot config
Refresh-failure taxonomy in lib/refresh-guardian.ts — rate-limit/auth/network bucketed cooldowns; missing-refresh accounts auto-disabledC-AUTH-13Surface bucketed states in operator diagnostics
Strong OAuth state generation — 16 bytes from node:crypto = 128-bit CSRF entropyC-AUTH-02Add invariant test; keep node:crypto source
Rich CLI surface organized by Start/Daily/Repair/Advanced per READMELM-11Keep README structure; keep separation of repair commands from daily ops

4. Critical Issues

Updated 2026-04-17 post-Oracle review — see docs/audits/evidence/oracle-verdicts.md. Original draft had no CRITICAL findings; Oracle elevated AUDIT-H1 upon determining the failing unit test IS the reproduction.

IDSeverityClaimEvidenceConfidenceImpactFix direction
AUDIT-C1CRITICALresolvePath() lookalike-prefix bypass — path-guard for import/export does not reject lookalike prefix paths outside home directory. Trust boundary broken: import/export surfaces can read from attacker-controlled siblings or write outside approved roots. Maps to "unsafe credential handling" + "core trust breakage" in severity rubric.test/paths.test.ts:842-846 (FAILING on HEAD); lib/storage/paths.ts:333-357; cross-ref AUDIT-H1, E-01, K-02confirmed (failing test = code-level reproduction)Local-access attacker creates lookalike directory (e.g. <HOME>/.codex-multi-auth-evil/) that bypasses guard; affects both read + write surfacesHarden isWithinDirectory() with canonicalization; add regression tests for home/project/tmp lookalikes on Windows + POSIX. Block next release on fix.

Post-Oracle severity adjustments (applied to findings-index.json with oracle_adjusted flag):

OriginalOracle VerdictFindingRationale
HIGHCRITICAL (elevated)AUDIT-H1 resolvePath lookalikeFailing test = reproduction; path-guard failure = trust breakage
MEDIUMHIGH (elevated)AUDIT-M09 project-scope silent bypass on CLI syncSilent credential leak across projects matches HIGH criteria
HIGHMEDIUM (demoted)AUDIT-H6 loadPluginConfig precedenceWorkaround exists; narrow blast radius
HIGHMEDIUM (demoted)AUDIT-H8 AGENTS.md stalenessDocs drift only, no runtime impact
HIGHMEDIUM (demoted)AUDIT-H9 SSE malformed-chunk discardDim-H internally marked MEDIUM — reconciled
HIGHMEDIUM (demoted)AUDIT-H10 dangling active pointerOperator-facing friction, no credential dimension
MEDIUMHIGH (conditional)AUDIT-M13 plaintext tokens at restElevates to HIGH only if AUDIT-C1 ships unfixed

Final severity distribution (post-Oracle): 2 CRITICAL · 6 HIGH · 38 MEDIUM · 14 LOW (60 total incl. AUDIT-C1).

Oracle top-3 refactor verdict (confirmed with rationale, see oracle-verdicts.md §2):

  1. R2 RedirectURI SSOT — closes AUDIT-H5 + eliminates drift class
  2. R3 Zod at JSON.parse boundaries — additive, fail-closed, scales to 59 sites
  3. R4 Routing mutex + selection-record — closes AUDIT-H2/H3/D09 simultaneously

Oracle assumptions flagged for follow-up validation (see oracle-verdicts.md §4):

  • Inspect pack:check tarball for .env/fixtures/secrets — if present, AUDIT-H7 → CRITICAL
  • Pin PKCE dep + audit source (C-AUTH-01)
  • Spot-check 3 dim-H citations (salvaged from step-budget-truncated agent)
  • Qualify hermeticity claim: applies to HOME/CODEX_MULTI_AUTH_DIR, NOT to CWD (evidenced by 6 tmp files at repo root)
  • Count lib/schemas.ts schemas vs unique payload shapes across 59 parse sites (R3 cost estimation)

5. High-Priority Improvements

IDSeverityClaimEvidenceFix direction
AUDIT-H1HIGHresolvePath() does not reject lookalike-prefix paths on HEAD — import/export path-guard regression; can redirect reads/writes outside approved home/project/tmp rootsE-01; K-02; test/paths.test.ts:842-846, lib/storage/paths.ts:333-357Reproduce with real Windows+POSIX lookalike cases, harden isWithinDirectory(), add regression tests
AUDIT-H2HIGHHybrid account selector returns unavailable accounts — selectHybridAccount() falls back to LRU even when available.length === 0; fetch loop trusts itD-01; lib/rotation.ts:379-392, lib/accounts.ts:668-697, index.ts:1149-1157Change selector contract to return null when no account available OR re-run isAccountAvailableForFamily() after hybrid selection
AUDIT-H3HIGHShort-window 429 retry does not mark account unavailable before sleeping — concurrent requests keep selecting the same freshly rate-limited accountD-07; index.ts:2089-2114, lib/accounts.ts:534-545,673-677Write immediate transient rate-limit marker before short-retry sleep OR reserve account locally until sleep window ends
AUDIT-H4HIGHLive OAuth URL leaks to stdout/clipboard — browser-fallback and manual login print raw URL with live state and code_challengeC-AUTH-05; lib/codex-manager.ts:1825-1841, lib/auth/auth.ts:15-20,262-269, lib/auth/browser.ts:158-177Print redacted display URL; keep full URL only for clipboard/open-browser handoff; add --show-full-url debug escape hatch
AUDIT-H5HIGHRedirect host drift — code uses localhost:1455 but docs claim 127.0.0.1:1455C-AUTH-03; lib/auth/auth.ts:12, lib/auth/server.ts:78-80,104, docs/reference/commands.md:84,98, CHANGELOG.md:125Choose one canonical callback origin; derive all user-facing strings from it; align docs/tests
AUDIT-H6HIGHloadPluginConfig CONFIG_PATH precedence bug — does not prefer primary over legacy CODEX_HOME path when both existF-01; K-04; test/plugin-config.test.ts:417Fix precedence order to match documented model (primary → legacy); add explicit precedence test matrix
AUDIT-H7HIGHnpm run pack:check FAILS exit=1 — pack budget violation; published tarball likely includes unintended filesLM-01; docs/audits/evidence/pack-check.txtInspect pack manifest; tighten files field in package.json; add CI gate on pack size delta
AUDIT-H8HIGHAGENTS.md stale across 4 axes — v0.1.x/Commit 9ac8a84/Generated 2026-03-01/"87 files, 2071 tests" vs reality v1.2.7/1f6da97/225 files/3418 testsLM-02; K-01; AGENTS.md §OVERVIEW vs context.txtRegenerate AGENTS.md via /init-deep or equivalent; make generation a release gate
AUDIT-H9HIGHSSE non-streaming conversion buffers full stream up to 10MB and silently discards malformed JSON chunksH-03; lib/request/response-handler.tsSurface malformed-chunk warnings via logger.warn; add structured parse-error taxonomy; consider streaming decode instead of buffer-then-parse
AUDIT-H10HIGHActive-account pointer can dangle after disable — getActiveIndexForFamily() clamps bounds only; setAccountEnabled() does not repair pointerD-05; lib/accounts.ts:506-512,583-592,1145-1155, lib/runtime/account-status.ts:10-17Normalize active indices on every disable/remove; "active" means routable, not merely in-range

6. Medium Improvements

IDSeverityClaimEvidenceFix direction
AUDIT-M01MEDIUMRecovery/session storage uses direct sync writes/deletes (no atomic temp+rename, no retry)E-03; lib/recovery/storage.ts:7,167-168,261-262,281-282,374-375Introduce shared atomic-write helper for recovery files + retry-safe deletes on Windows lock codes
AUDIT-M02MEDIUMConcurrency guard is in-process only — two CLI processes can race on shared ~/.codex/multi-auth filesE-04; lib/storage/transactions.ts:10-34, lib/unified-settings.ts:23,423-430Add advisory file locking OR journal/compare-and-swap for shared files
AUDIT-M03MEDIUMActive code supports V1↔V3 only; no V2 migration path despite docs claiming V1/V2→V3E-05; lib/storage.ts:1155-1180, lib/storage/migrations.ts:83-116Implement explicit V2 handling OR correct docs
AUDIT-M04MEDIUMAccount-clear writes reset marker AFTER deleting artifacts — crash between looks like accidental lossE-07; lib/storage/account-clear.ts:58-64Align account-clear ordering with flagged-clear flow: marker first
AUDIT-M05MEDIUMFlagged-account read retry omits EPERME-08; lib/storage/flagged-storage-file.ts:4-26, lib/storage/flagged-storage-io.ts:34-52Include EPERM in retryable flagged-read codes
AUDIT-M06MEDIUMRouting non-determinism — PID-based bias + cursor mutation makes reproduction hardD-02; lib/rotation.ts:332-338,400-425, lib/accounts.ts:693-696Make deterministic mode default; gate PID bias behind explicit opt-in flag
AUDIT-M07MEDIUMHealth + quota tracker state NOT persisted; resets on restartD-03; lib/rotation.ts:78-83,184-188,543-557, lib/accounts.ts:887-910Persist routing state OR explicitly mark ephemeral + avoid "stable" health claims
AUDIT-M08MEDIUMlib/health.ts stale vs live AccountManager state — uses wrong field namesD-04; lib/health.ts:27-53, lib/accounts.ts:253-255,725-731,806-818Rebuild health report from AccountManager directly OR delete stale abstraction
AUDIT-M09MEDIUMProject-scoped isolation silently bypassed when Codex CLI sync enabled — forces global storageD-06; lib/runtime/storage-scope.ts:20-34, lib/storage.ts:598-623Treat as hard config conflict with surfaced state; scope synced state per project identity
AUDIT-M10MEDIUMStream failover bypasses server-error policy path — 5xx bursts on fallback don't update shared cooldownD-08; index.ts:1974-2059,2198-2507,2395-2445Reuse evaluateFailurePolicy() helper inside stream failover
AUDIT-M11MEDIUMCallback server doesn't eager-close on terminal outcomes; close() doesn't await shutdownC-AUTH-11; lib/auth/server.ts:41-99Close immediately after terminal outcomes; convert mismatch/duplicate-code paths into explicit terminal results
AUDIT-M12MEDIUMToken freshness trusts persisted expires/expiresAt only — doesn't decode JWT expC-AUTH-07; lib/proactive-refresh.ts:54-72,81-85, lib/auth/auth.ts:165-179, lib/accounts.ts:1033-1039Fall back to decoded JWT exp when metadata missing; treat missing expiry as refresh-needed
AUDIT-M13MEDIUMAccess tokens + refresh tokens both stored plaintext JSON (file mode 0600)C-AUTH-08; lib/storage/migrations.ts:46-69, lib/accounts.ts:887-907, lib/storage.ts:1673-1687Minimize access-token caching OR move secrets to OS keychain
AUDIT-M14MEDIUMPort 1455 duplicated across server bind + status copy + oauth-success.html instead of derived from single parsed redirect URIC-AUTH-04; lib/auth/server.ts:78-80,107, lib/ui/copy.ts:67, lib/oauth-success.html:117-123Parse redirect URI once; feed server bind + UI copy + html from shared helpers
AUDIT-M15MEDIUMManual-paste callback returns null on state mismatch — surfaced as generic callback-miss textC-AUTH-06; lib/codex-manager.ts:1257-1332,1854-1867, lib/runtime/manual-oauth-flow.ts:59-69,81-86Return structured mismatch error; surface explicit state-mismatch in both flows
AUDIT-M16MEDIUMNo distinct connect timeout — single total timeout for both connect + body/streamH-02; lib/request/fetch-helpers.ts:724-969Add connectTimeoutMs separate from total + stall
AUDIT-M17MEDIUMObservability uneven — trace ID / account ID / attempt # not uniformly attached across retry/failover branchesH-05; index.ts multi call sitesDefine log schema with mandatory correlation fields; structured logger with required keys
AUDIT-M18MEDIUMDeprecation/sunset headers logged in success path only — not in error pathsH-08; lib/request/fetch-helpers.tsLog deprecation headers in both success + error handling
AUDIT-M19MEDIUMMid-stream failover intentionally disabled after first byte — users see hard error mid-generationH-04; lib/request/stream-failover.tsDocument limitation; consider opt-in "resume with marker" for idempotent prompts
AUDIT-M20MEDIUMJSON.parse surface is 59 calls across 31 files; schema validation centralized in single lib/schemas.ts not applied at parse boundariesI-03, I-04; lib/schemas.ts; high clusters in request-init, runtime/request-init, storage, recovery, fetch-helpersWrap JSON.parse call sites with safeParse* helpers from lib/schemas.ts; add schemas for currently-unvalidated payloads
AUDIT-M21MEDIUMDual-linter stack (eslint + biome) without documented scope separationF-02; repo root files, package.json scriptsAdopt one OR document scope split (biome=format, eslint=correctness)
AUDIT-M22MEDIUMprepare hook installs husky on every npm install — mutates .git/hooks/ as side effectF-03; package.json scripts.prepareDocument side effect prominently in CONTRIBUTING; consider opt-in install
AUDIT-M23MEDIUMEnv var surface 11+ vars; no central env-schemaF-04; README Configuration sectionCentralize env validation via z.object() in lib/schemas.ts; parse process.env at startup
AUDIT-M24MEDIUMsettings-hub.ts ~2100 LOC — overgrown file mixing theme/account/sync/diagnostics/experimentalG-01, JN-03; AGENTS.md §WHERE TO LOOKSplit by sub-concern (see Section 8 Refactor R1)
AUDIT-M25MEDIUMauth list empty-storage message drift — test expects "Storage was intentionally reset." but code outputs "No accounts configured." / "Storage: " / "Storage health: empty"G-02, K-03; test/codex-manager-cli.test.ts:913Align code output to documented messaging or update tests after agreeing on canonical message
AUDIT-M26MEDIUM--json coverage unclear across subcommand surface (confirmed on report, doctor, verify-flagged; unverified on list, switch, check, forecast, fix)G-03; READMEAudit each subcommand; standardize --json + deterministic exit codes + schema
AUDIT-M27MEDIUMBifurcation lib/codex-cli/ and lib/codex-manager/ without documented boundaryG-06, JN-04; AGENTS.md §STRUCTUREDocument ownership map OR merge
AUDIT-M28MEDIUMExperimental settings flagged but no stability-promise policyG-09; README Experimental sectionDocument experimental-tier semver policy
AUDIT-M29MEDIUMError taxonomy implicit — no central CodexError/AuthError/NetworkError base class confirmedJN-05Introduce structured error hierarchy; map failures to stable error codes
AUDIT-M30MEDIUMDuplicate 1455 port constant across auth, server, copy, html (cross-ref C-AUTH-04)JN-08, C-AUTH-04Derive from shared helper
AUDIT-M31MEDIUM6 tmp files at repo root (tmp-flagged.json.*.tmp, tmp-accounts.marker) — test-cleanup leakageE-02, JN-09, LM-06; clean-repo-check.txt footer + test/account-clear.test.ts:13-45, test/flagged-storage-io.test.ts:29-53Move tests to temp dirs via os.tmpdir(); use shared retry cleanup helper
AUDIT-M32MEDIUMCHANGELOG drift check incomplete — v1.2.5/6/7 vs git log v1.2.4..HEAD not cross-referencedLM-07Run CHANGELOG check per release; add CI gate
AUDIT-M33MEDIUMSemver over v1.2.4–v1.2.7 — no 2.x breaking changes; need to verify no silent breaking behaviors in minorsLM-10Add behavioral-change flag in CHANGELOG entries
AUDIT-M34MEDIUMNon-streaming SSE conversion buffers full stream up to 10MB in memory before parseP-03, H-03Streaming decode OR bounded chunk count
AUDIT-M35MEDIUMHot paths (request pipeline, SSE parser, account selection, storage writes, token refresh) lack benchmarksP-02Add micro-benchmarks for top-3 hot paths

7. Low-Priority Cleanups

IDSeverityClaimEvidenceFix
AUDIT-L01LOWPKCE entropy is "probable strong" — generator lives in external dep, not audited in-repoC-AUTH-01Add regression test asserting S256 + document trust boundary
AUDIT-L02LOWToken endpoint + authorize endpoint hardcoded — no allowlist abstractionC-AUTH-12Centralize behind small allowlisted resolver for test/regional variants
AUDIT-L03LOWAuth storage corruption recovery works but silent — no operator-visible signal when WAL/backup path takenC-AUTH-09Emit audit log when recovery path used
AUDIT-L04LOWdocs/reference/storage-paths.md references non-existent deriveProjectKey; code exports getProjectStorageKeyE-06; docs/reference/storage-paths.md:67-76, lib/storage/paths.ts:217-245Update docs
AUDIT-L05LOWRecovery readers silently skip unreadable files (no corruption signal)E-09; lib/recovery/storage.ts:67-114,273-384Log corruption counts/paths
AUDIT-L06LOWdocs/reference/settings.md precedence rule not formally verifiedF-07Add formal precedence table
AUDIT-L07LOWFallback 429 path hardcodes "quota" reason in one branch; primary passes stableAccountKeyH-06; index.ts:2408-2412Align fallback branch with primary
AUDIT-L08LOWEmpty-response retry after SSE conversion may trigger unnecessary round-tripH-10; index.ts:2169-2681Add log; bounded count
AUDIT-L09LOWTest output contains stray PowerShell node.exe error lines on Windows — harness brittlenessK-09; test-summary.txt:16-22Investigate stderr redirection
AUDIT-L10LOWTest import phase 40s dominates startupK-08Profile module graph for lazy-import opportunities
AUDIT-L11LOWNo perf regression CI gate; bench results not baselinedP-06Add perf CI with bench baseline commits
AUDIT-L12LOWRepeated regex compilation not fully verified; potential hot-path trapP-05Targeted new RegExp( scan
AUDIT-L13LOWProperty/chaos catalogs not explicitly inventoried in this auditK-06Document invariants property-tested + failures chaos-injected
AUDIT-L14LOWDead code scan incomplete (ts-prune not run)JN-07Run ts-prune pass; file findings

8. Refactoring Plan

R1. Split lib/codex-manager/settings-hub.ts (2100 LOC)

  • Why: Single file mixing theme / accounts / sync / diagnostics / experimental. Future additions will make it worse. Cognitive load + merge-conflict surface.
  • Files: lib/codex-manager/settings-hub.ts → new settings-hub/{theme,accounts,sync,diagnostics,experimental,index}.ts
  • Target: Each sub-module <500 LOC; index.ts composes menu tree; each sub-module exports render() + action handlers
  • Implementation order: 1) create sub-module files with empty exports, 2) extract theme first (smallest + well-isolated), 3) extract experimental, 4) extract diagnostics, 5) extract sync, 6) extract accounts, 7) convert root file to composition
  • Migration risk: LOW — internal structure; public CLI surface unchanged. Test by bun test test/codex-manager-cli.test.ts at each step
  • Payoff: Reviewable diffs; independent module tests; contributor ramp-up easier

R2. Introduce RedirectURI single source of truth

  • Why: Current drift (localhost vs 127.0.0.1) causes confirmed login-break risk + 4+ duplicated port 1455 literals
  • Files: lib/auth/auth.ts (REDIRECT_URI const), lib/auth/server.ts (bind), lib/ui/copy.ts, lib/oauth-success.html, docs/reference/commands.md, docs/getting-started.md, CHANGELOG.md
  • Target: Single export const AUTH_REDIRECT = { host, port, path, origin, full } parsed once; all sites import
  • Order: 1) define + export constant, 2) migrate server bind, 3) migrate auth flow, 4) migrate copy/html, 5) regen docs from constant, 6) add invariant test
  • Migration risk: MEDIUM — user-facing OAuth redirect change if host standardizes on 127.0.0.1; existing Google OAuth apps may need re-registration
  • Payoff: Kills drift class; fixes AUDIT-H5

R3. Consolidate JSON.parse behind Zod schemas

  • Why: 59 parse calls across 31 files; single-file Zod hub exists but not applied at parse boundaries. Validation gap on untrusted payloads.
  • Files: lib/schemas.ts (add schemas), all lib/** with JSON.parse call sites (highest: lib/request/request-init.ts, lib/runtime/request-init.ts, lib/storage.ts, lib/recovery/storage.ts, lib/request/fetch-helpers.ts)
  • Target: Every JSON.parse → safeParse* helper returning { success, data | error }
  • Order: 1) schemas for storage payloads (highest blast radius), 2) recovery, 3) request/response, 4) config, 5) ancillary
  • Risk: LOW — additive change; fail-closed on parse error maps to clear operator signal
  • Payoff: Hardens boundaries; improves runtime error messages; sets pattern for future parse sites

R4. Introduce routing mutex + selection-record pattern

  • Why: Concurrent fetch + cursor mutation + debounced save can produce out-of-order persistence (AUDIT-H2 + D-09)
  • Files: lib/accounts.ts (cursor + lastUsed mutation), lib/rotation.ts (selector)
  • Target: selectAccount returns SelectionRecord = { account, selectionId, timestamp }; cursor advanced only after record accepted by fetch loop (or reverted on fast-fail); persistDebounced operates on ordered record queue
  • Order: 1) add record type, 2) thread through fetch loop, 3) wrap cursor mutation in withMutex, 4) replay tests under concurrency
  • Risk: MEDIUM — touches hot path; benchmark before merge
  • Payoff: Fixes hybrid-selector bug (AUDIT-H2) and 429-race (AUDIT-H3) without amplifying throttling

R5. Unify lib/health.ts with live AccountManager state

  • Why: Stale abstraction reports wrong fields (D-04)
  • Files: lib/health.ts, lib/accounts.ts
  • Target: getAccountHealth reads from tracker state directly OR module deleted and consumers redirected
  • Order: 1) inventory consumers, 2) refactor to tracker-direct, 3) delete stale module if no consumers, 4) regression-test health snapshot shape
  • Risk: LOW — internal; diagnostic output shape may shift (communicate in CHANGELOG)
  • Payoff: Fixes operator-visible drift

R6. Harden recovery storage with atomic writes

  • Why: E-03 confirmed violation; only recovery still uses direct sync writes
  • Files: lib/recovery/storage.ts
  • Target: Extract shared atomicWriteFile(path, data) helper used by lib/storage.ts + apply to recovery
  • Order: 1) extract helper, 2) migrate recovery writes, 3) add retry-safe delete helper for Windows locks, 4) regression tests
  • Risk: LOW
  • Payoff: Recovery state survives mid-write crashes

9. Feature Recommendations

All features tied to concrete findings. Priorities: H = next cycle, M = subsequent, L = opportunistic.

F1. codex auth why-selected [--last|--now|--json] (Priority: H)

  • Problem: AUDIT-M06 + D-04 — users can't understand why a given account was chosen; routing non-determinism + stale health model erode trust
  • Fit: Complements existing status/report/forecast; same TUI/JSON pattern
  • Complexity: S — read from existing tracker snapshot; add one CLI verb
  • Deps: R5 (unified health)
  • Risk: LOW — read-only

F2. codex auth verify --paths [--json] (Priority: H)

  • Problem: AUDIT-H1 — resolvePath regression means users need a self-test for their import/export targets before using them
  • Fit: Aligns with doctor --fix philosophy
  • Complexity: S
  • Deps: None
  • Risk: LOW

F3. Per-account disable/quarantine with explicit state + TTL (Priority: H)

  • Problem: AUDIT-H10 (dangling active pointer) + D-05 — no explicit quarantine/disable lifecycle; state transitions implicit
  • Fit: Makes Section 8 R4 tangible at UX layer
  • Complexity: M — add state: { value, enteredAt, ttl?, reason } to account record; migrate storage schema (bump V3 → V4 with migration); update selection/health
  • Deps: R4, storage V4 migration
  • Risk: MEDIUM — schema change; release-note required

F4. Structured incident-report bundle codex auth bundle [--out <path>] (Priority: M)

  • Problem: H-05 (uneven observability) + J/N findings — post-failure reconstruction is hard; users need a redacted diagnostics tarball to share
  • Fit: Extends report --json; produces filesystem bundle (logs + redacted state + env + timestamps)
  • Complexity: M — needs redaction helper (reuse from this audit); tar/zip output
  • Deps: Structured logger (R via JN-06)
  • Risk: HIGH if redaction is weak — must apply JWT/email/path redaction before bundling

F5. Repair/migration dry-run preview codex auth fix --preview --json (Priority: M)

  • Problem: AUDIT-M04 (account-clear ordering) — users can't preview safe fixes before applying
  • Fit: Existing fix --dry-run has shape; extend with structured output
  • Complexity: S
  • Risk: LOW

F6. Shell completion for codex auth (Priority: M)

  • Problem: G-08 — help discoverability unverified; rich subcommand surface
  • Fit: Standard OSS CLI DX
  • Complexity: S — generate bash/zsh/fish/powershell completions
  • Risk: LOW

F7. Codex CLI compatibility probe codex auth compat [--json] (Priority: M)

  • Problem: Implicit version coupling to upstream @openai/codex or native binary; no compatibility check
  • Fit: Guards install/upgrade flow
  • Complexity: S
  • Risk: LOW

F8. Pool backup + restore with filename prompt (Priority: M)

  • Note: Already in experimental tier per README — graduate to stable after adding collision safety test
  • Problem: Experimental tier unstable; users want safe backup/restore
  • Fit: Stabilize existing flow
  • Complexity: S (test + docs)
  • Risk: LOW

F9. Machine-readable status/forecast/report unification (Priority: M)

  • Problem: AUDIT-M26 — --json coverage inconsistent
  • Fit: Standardize schema version across status, check, list, forecast, report
  • Complexity: M
  • Risk: LOW

F10. Perf regression CI gate (Priority: L)

  • Problem: P-06 — no baseline
  • Fit: Extends existing CI
  • Complexity: M (bench baseline commits + comparison job)

10. Security / Trust Review

Auth / Token Handling

  • Positive: OAuth state 128-bit (C-AUTH-02); PKCE S256 (C-AUTH-01); file mode 0600 on token storage
  • Concerns: Live OAuth URL leaks to stdout/clipboard (AUDIT-H4); redirect host drift localhost vs 127.0.0.1 (AUDIT-H5); JWT exp not validated on load (AUDIT-M12); access + refresh tokens both in plaintext JSON at rest (AUDIT-M13); token endpoint hardcoded single-host (AUDIT-L02)

Local State / Storage

  • Positive: Atomic writes for primary storage + WAL/backup corruption recovery (C-AUTH-09)
  • Concerns: resolvePath lookalike-prefix bypass (AUDIT-H1); recovery storage non-atomic (AUDIT-M01); in-process-only concurrency guard (AUDIT-M02); V2 migration missing (AUDIT-M03); project-scoped isolation collapses to global on CLI sync (AUDIT-M09)

Privacy / Logging

  • Positive: docs/privacy.md exists; clean audit:ci
  • Concerns: Structured logger present but fields inconsistent (AUDIT-M17); WAL/backup recovery silent (AUDIT-L03); recovery readers silently skip unreadable files (AUDIT-L05)

Trust Messaging vs Real Guarantees

  • README claims reliability behaviors (whole-pool replay disabled by default, bounded outbound budget, burst cooldown) — verified in code
  • Docs claim canonical redirect is 127.0.0.1:1455 — code uses localhost:1455 (DRIFT: AUDIT-H5)
  • AGENTS.md claims "87 files, 2071 tests" — reality 225/3418 (DRIFT: AUDIT-H8)
  1. Fix resolvePath lookalike bypass + add regression tests for home/project/tmp lookalikes
  2. Redact OAuth URL in user-facing output (AUDIT-H4)
  3. Reconcile redirect host: pick 127.0.0.1 (better for OAuth pinning) + update all 4+ sites + tests + docs (AUDIT-H5)
  4. Decode JWT exp on token load; treat missing expiry as refresh-needed (AUDIT-M12)
  5. Consider OS keychain integration for token storage (AUDIT-M13)
  6. Apply Zod at all JSON.parse boundaries (R3)
  7. Add redaction helper as published module (also supports F4 incident bundle)

11. Testing Gap Analysis

Covered Well

  • V3 storage format (fixtures in test/fixtures/v3-storage.json)
  • Refresh-queue race dedupe (C-10)
  • Chaos test directory + property-test directory exist — good stratification (K-06)
  • Hermeticity works as designed when env redirected (K-05)

Under-Tested

  • resolvePath lookalike rejection — currently BROKEN (K-02)
  • Hybrid selector behavior when no accounts available (D-01) — no regression test
  • Short-window 429 concurrent-request race (D-07) — no concurrent-request test
  • loadPluginConfig CONFIG_PATH precedence (K-04) — test exists but is failing
  • auth list empty-storage message (K-03) — test exists but drift
  • V2 migration (E-05) — absent code path, absent tests
  • SSE malformed-chunk handling (H-03) — silent discard, likely no explicit coverage
  • Mid-stream failover recovery (H-04) — chaos test candidate
  • CHANGELOG-to-git-log consistency per release (LM-07)
  • Pack-budget regression (LM-01) — failing on HEAD

Exact Cases To Add (ordered by impact)

  1. resolvePath lookalike prefix → throw on Windows + POSIX (fixes AUDIT-H1; unblocks coverage)
  2. selectHybridAccount when all accounts blocked → returns null (fixes AUDIT-H2)
  3. Concurrent requests on single rate-limited account → one marks, rest wait (fixes AUDIT-H3)
  4. loadPluginConfig with both CONFIG_PATH + CODEX_HOME set → primary wins (fixes AUDIT-H6)
  5. auth list empty storage → canonical message (fixes AUDIT-M25)
  6. V2 storage payload → either migrates or rejects explicitly (fixes AUDIT-M03)
  7. SSE malformed chunk → emits structured warn log (fixes AUDIT-H9)
  8. Pack-size delta → CI gate (fixes AUDIT-H7)
  9. Invariant: PKCE always S256 (preserves C-AUTH-01)
  10. Invariant: OAuth state always 16-byte crypto random (preserves C-AUTH-02)

Best Order To Improve Confidence

  1. Add security regressions first (cases 1-3) — highest blast radius
  2. Fix existing failing tests (cases 4-5) — green baseline
  3. Add taxonomy gaps (cases 6-7)
  4. Add CI gates (case 8)
  5. Lock in positives (cases 9-10)

12. CLI / DX / Docs Review

Command Ergonomics (Strong)

  • README "Start here / Daily use / Repair / Advanced" taxonomy is excellent
  • doctor --fix as canonical safe-recovery is a good trust pattern
  • Dashboard hotkeys (Q=cancel) consistent per AGENTS.md

Command Ergonomics (Gaps)

  • --json coverage uneven (AUDIT-M26)
  • No shell completion (F6)
  • No why-selected visibility (F1)
  • No incident bundle (F4)
  • Help discoverability unverified (G-08)

Install/Upgrade Flow

  • npm i -g codex-multi-auth standard; legacy @ndycode/codex-multi-auth migration path documented — GOOD
  • prepare → husky install side effect undocumented in CONTRIBUTING (AUDIT-M22)
  • pack:check fails — tarball bloat (AUDIT-H7)

Docs Mismatches

  • AGENTS.md stale across 4 axes (AUDIT-H8)
  • Redirect host localhost vs 127.0.0.1 (AUDIT-H5)
  • docs/reference/storage-paths.md references deriveProjectKey (does not exist); code uses getProjectStorageKey (AUDIT-L04)
  • CHANGELOG-to-git-log cross-ref not verified (AUDIT-M32)

Contributor Ergonomics

  • Governance files complete (SECURITY.md, CODE_OF_CONDUCT, CONTRIBUTING, LICENSE) — LM-08
  • Runbooks present in docs/development/ (RUNBOOK_ADD_AUTH_COMMAND.md, etc.) — strong onboarding signal
  • Strict TS + 0 escape hatches + lint — signals strong taste (I-01, I-02)
  • Dual linter confusion (AUDIT-M21)

OSS Readiness: STRONG

  • All governance stack present
  • CodeQL + plugin scanner + CI + PR-CI workflows
  • Clean supply chain audit
  • Active security bump cadence
  • README structural quality high

13. Quick Wins

Each is S-effort (hours, not days):

  1. Fix AGENTS.md staleness (regen via /init-deep) — AUDIT-H8
  2. Fix README/docs redirect host (localhost → 127.0.0.1) — AUDIT-H5
  3. Fix docs/reference/storage-paths.md deriveProjectKey typo — AUDIT-L04
  4. Delete/fix the 6 repo-root tmp files + patch leaking tests (test/account-clear.test.ts, test/flagged-storage-io.test.ts) — AUDIT-M31
  5. Add EPERM to flagged-read retry codes — AUDIT-M05
  6. Align account-clear marker-first ordering with flagged-clear flow — AUDIT-M04
  7. Emit audit log when WAL/backup corruption recovery is used — AUDIT-L03
  8. Log deprecation headers in error paths too — AUDIT-M18
  9. Document prepare-hook husky side effect in CONTRIBUTING.md — AUDIT-M22
  10. Document dual-linter scope (eslint=correctness, biome=format) in CONTRIBUTING.md — AUDIT-M21
  11. Align fallback 429 path to pass stableAccountKey — AUDIT-L07
  12. Add invariant test: PKCE always S256 — preserves C-AUTH-01
  13. Add invariant test: OAuth state 16 crypto bytes — preserves C-AUTH-02
  14. Add pack-size CI gate (runs pack:check, fails PR on regression) — AUDIT-H7 preventive
  15. Add strict: true explicit documentation to CONTRIBUTING — I-05 lock-in
  16. Document codex-cli vs codex-manager boundary in lib/AGENTS.md — AUDIT-M27
  17. Add resolvePath regression tests (home/project/tmp lookalikes, Windows+POSIX) — AUDIT-H1 coverage
  18. Redact OAuth URL in user-facing browser-launch output — AUDIT-H4
  19. Move test tmp files to os.tmpdir() — AUDIT-M31
  20. Document experimental-tier stability policy in README — AUDIT-M28

14. Phased Implementation Roadmap

Phase 1: Correctness & Safety (2–3 weeks)

Scope: All HIGH findings + security-relevant MEDIUM. Tasks:

  • Fix AUDIT-H1 resolvePath + add regression tests
  • Fix AUDIT-H2 hybrid selector contract + test
  • Fix AUDIT-H3 short-429 race + concurrent-request test
  • Fix AUDIT-H4 OAuth URL redaction
  • Fix AUDIT-H5 redirect host (R2 refactor)
  • Fix AUDIT-H6 CONFIG_PATH precedence
  • Fix AUDIT-H7 pack:check + add CI gate
  • Regen AGENTS.md (AUDIT-H8)
  • Fix AUDIT-H9 SSE malformed-chunk logging
  • Fix AUDIT-H10 active-pointer dangling
  • Fix 3 failing tests (K-02, K-03, K-04) Deps: None (correctness fixes are independent) Benefits: Unblocks release; restores test-suite green baseline; fixes confirmed security regression Rollback: Each fix is small/atomic — per-commit revert

Phase 2: Architecture & Refactor (3–5 weeks)

Scope: R1-R6 refactors + error-taxonomy introduction. Tasks:

  • R1 settings-hub split
  • R2 redirect-URI SSOT (from Phase 1)
  • R3 JSON.parse → Zod schemas at boundaries
  • R4 routing mutex + selection-record
  • R5 unify health with AccountManager
  • R6 atomic writes for recovery
  • AUDIT-M29 introduce CodexError/AuthError/NetworkError hierarchy
  • AUDIT-M17 structured logger schema with required correlation fields Deps: Phase 1 green baseline Benefits: Long-term maintainability; reduced race surface; clean observability Rollback: R1 + R5 + R6 independent; R2 already Phase-1; R3 additive; R4 needs feature flag for safe rollback

Phase 3: Testing, Docs, DX (2–3 weeks)

Scope: Test gap closure + docs truth-up + DX features. Tasks:

  • Testing cases 1-10 from Section 11
  • --json standardization (AUDIT-M26)
  • Shell completion (F6)
  • Fix docs drifts (AUDIT-L04, AUDIT-M32)
  • Document experimental-tier policy (AUDIT-M28)
  • Document codex-cli vs codex-manager (AUDIT-M27)
  • Add perf regression CI (AUDIT-L11) Deps: Phase 1 + Phase 2 Benefits: Contributor confidence; operator confidence; release safety Rollback: All additive

Phase 4: Strategic Features (4–6 weeks)

Scope: Feature recs F1-F9. Tasks:

  • F1 why-selected (depends on R5)
  • F2 verify --paths
  • F3 per-account quarantine state (needs V3→V4 migration)
  • F4 incident bundle (needs redaction helper + R3 logger)
  • F5 fix --preview
  • F7 Codex CLI compat probe
  • F8 graduate backup/restore from experimental
  • F9 unified machine-readable outputs Deps: Phase 2 (schemas, logger, routing mutex) Benefits: User experience, debuggability, trust Rollback: Each feature behind --experimental flag until stable

Formula: score = severity_weight × probability × blast_radius. Weights: CRITICAL=5, HIGH=4, MEDIUM=2, LOW=1.

RankActionCategoryReasonDifficultyImpact
1Fix resolvePath lookalike bypass + regression testsSecurity/CorrectnessHIGH-security path guard failureMHIGH
2Fix hybrid selector to return null when no accounts availableRoutingPrevents unavailable-account retriesSHIGH
3Fix short-429 race (mark unavailable before sleep)RoutingAmplified throttling under concurrencySHIGH
4Redact OAuth URL in user-facing outputSecurityCSRF/state leak to stdout/clipboardSHIGH
5Reconcile redirect host (127.0.0.1 canonical) + SSOT refactor (R2)Docs/SecurityConfirmed drift breaks login; 4+ duplicate sitesMHIGH
6Fix loadPluginConfig CONFIG_PATH precedenceConfigFailing test; config bugSHIGH
7Fix pack:check + add CI gateReleaseTarball bloat; release blockerSHIGH
8Regenerate AGENTS.md (docs truth-up)Docs4-axis drift erodes contributor trustSHIGH
9Surface SSE malformed-chunk as structured warnReliabilitySilent data lossSHIGH
10Normalize active-account pointer on disable/removeRoutingDangling pointer UX/state confusionSHIGH
11Split settings-hub.ts into sub-concerns (R1)Architecture2100 LOC overgrown fileLMEDIUM
12Zod schemas at all JSON.parse boundaries (R3)Security/Types59 raw parse sitesLMEDIUM
13Add connectTimeoutMs distinct from total/stallReliabilityDiagnostic clarity + connect-vs-upstream disambiguationSMEDIUM
14Structured logger schema with required correlation fieldsObservabilityUneven logs across retry pathsMMEDIUM
15Atomic writes + retry-safe deletes for recovery storage (R6)ReliabilityMid-write crash risk in recoverySMEDIUM
16Move test tmp files to os.tmpdir() + shared cleanup helperTests6 leaking tmp files at repo rootSMEDIUM
17Introduce CodexError/AuthError/NetworkError taxonomyErrorsAd-hoc error constructionMMEDIUM
18V2 migration path (match docs claim) or docs correctionStorageV1↔V3 only; docs overstateSMEDIUM
19Feature F1 codex auth why-selected + F2 verify --pathsFeaturesOperator visibility + path self-testMMEDIUM
20Invariant tests for PKCE S256 + OAuth state 16-byte cryptoTestsLock-in positive findingsSLOW-MEDIUM

16. Module-by-Module Notes

ModulePurposeStrengthsConcernsVerdict
index.ts7-step fetch pipeline4-gate loop termination (H-09); deprecation header logging (success path)7 steps not documented inline (H-01); fallback 429 hardcodes reason (H-06); active-pointer normalization missing (D-05)Harden — inline step comments + fix fallback paths
lib/auth/OAuth + PKCE + callbackStrong OAuth state entropy (C-AUTH-02); refresh-guardian taxonomy (C-AUTH-13); auth storage corruption recovery (C-AUTH-09); refresh-queue race prevention (C-10)Redirect host drift (C-AUTH-03); live URL leak (C-AUTH-05); port duplicated (C-AUTH-04); JWT exp unvalidated (C-AUTH-07); callback server not eager-close (C-AUTH-11); access+refresh tokens plaintext (C-AUTH-08)Harden — R2 SSOT + secret minimization
lib/accounts.ts + lib/accounts/Per-account rate-limit tracking + selectionCase-insensitive email dedup (AGENTS.md); health scoring implementationHybrid selector returns unavailable (D-01); non-deterministic routing (D-02); health + quota memory volatile (D-03); active pointer can dangle (D-05); project-scoped bypass on CLI sync (D-06); routing races (D-09)Refactor — R4 mutex + selection record
lib/codex-cli/CLI state, sync, observability, writerExistence of observability layerUnclear boundary vs lib/codex-manager/ (G-06)Document or merge
lib/codex-manager/Command dispatcher + settings-hubRich command surface; Q=cancel consistencysettings-hub.ts 2100 LOC (G-01); auth list message drift (G-02); --json coverage uneven (G-03); experimental tier undocumented (G-09)Split — R1
lib/prompts/Model-family prompts + GitHub ETag cacheETag cache is performance-awareNot deeply audited in this passPreserve
lib/recovery/Conversation state persistenceDefensive against partial writes (via skip-unreadable)Non-atomic writes (E-03); silent-skip no-log (E-09)Harden — R6
lib/request/Request transform, SSE, failover, backoff4-gate termination; structured failure policy; burst cooldownSSE non-streaming buffers 10MB + silent-skip malformed (H-03); observability uneven (H-05); stream failover bypasses server-error policy (D-08); no connect timeout (H-02)Harden
lib/storage/V1↔V3 migrations + worktree resolutionV3 format robust; WAL + backups; Windows removeWithRetryresolvePath lookalike bypass (E-01/K-02) — HIGH; V2 absent (E-05); in-process-only locks (E-04); account-clear ordering (E-07)Harden — priority on E-01
lib/tools/Hashline tool helpersScoped, focusedNot deeply auditedPreserve
lib/ui/ansi, auth-menu, theme, select, copyTheme live-preview + baseline restore pattern (G-05)Some text duplicates port 1455 literal (C-AUTH-04)Preserve — fix R2 duplication
scripts/install, build, hygiene, benchmarks11+ scripts; Windows-safe removeWithRetry; verify-vendor-provenance.mjs; check-pack-budget.mjs; audit-dev-allowlist.jspack:check FAILS on HEAD (LM-01)Fix pack:check
test/225 files, 3418 tests, 80% coverageChaos + property + fixtures; hermetic via env redirection (K-05)3 failing tests on HEAD (K-02/K-03/K-04); repo-root tmp leakage (K-09/JN-09); coverage % unverified (K-07)Harden — fix failures + leakage
docs/14+ markdown docs + sub-directoriesStrong README taxonomy; full governance; runbooksRedirect host drift (AUDIT-H5); deriveProjectKey typo (E-06); AGENTS.md stale (AUDIT-H8); CHANGELOG drift unverified (LM-07)Truth-up
.github/workflows/ci, pr-ci, codex-plugin-scanner, codeqlFull OSS CI stack; CodeQL + plugin scanner + dep scannerNo perf CI gate (P-06); no pack-size gate (LM-01 preventive)Extend
bench/format-benchmark/Code-edit format benchmarkFocused, documentedHot paths (request, SSE, selection, storage) lack bench (P-02)Extend
vendor/codex-ai-plugin, vendor/codex-ai-sdkFile-protocol vendored depsvendor:verify provenance script; bundleDependenciesBlack-box inside audit scopePreserve provenance discipline

17. Final Verdict

Is the codebase structurally healthy?

Yes — it is structurally healthy with known, addressable gaps. This is a serious, well-crafted CLI tool with strong TypeScript discipline (0 as any, 0 @ts-ignore), hermetic test design, active security maintenance, full OSS governance, and a clean supply chain. The architecture boundaries (auth / accounts / request / storage / codex-manager) are sensible; the 7-step request pipeline has safety gates; OAuth uses PKCE with 128-bit CSRF entropy.

The current HEAD (v1.2.7) has 10 HIGH-severity findings across correctness/security/docs that should be addressed before the next minor release. None reach CRITICAL severity with confirmed evidence in this pass, but AUDIT-H1 (resolvePath lookalike bypass) is CRITICAL-candidate pending real-host reproduction.

What is the biggest long-term bottleneck?

The settings-hub monolith (2100 LOC) and the implicit state machines in lib/accounts.ts + lib/rotation.ts. Together they concentrate change risk in single files and blur ownership. Left unsplit, every new feature touches them, regression surface grows, and subtle races (AUDIT-H2/H3/M07) will keep resurfacing.

The secondary bottleneck is absence of a structured error taxonomy + uniform logger schema — makes post-incident reconstruction hard and slows debugging.

What should be implemented first?

Phase 1 Correctness & Safety — fix the 10 HIGH findings and restore a green test baseline. Specifically:

  1. resolvePath lookalike + selectHybridAccount + short-429 race + OAuth URL redaction + loadPluginConfig precedence + pack:check — in parallel tracks
  2. codex auth list empty-storage message + resolvePath test + plugin-config precedence test — restores green baseline
  3. AGENTS.md regen + localhost → 127.0.0.1 docs truth-up

What must NOT be broken during refactoring?

Preserve (cross-referenced to Section 3):

  1. Hermetic test design — HOME + CODEX_MULTI_AUTH_DIR env-redirect pattern; verified zero drift
  2. Strict TypeScript doctrine — 0 as any, 0 @ts-ignore, strict: true
  3. Refresh-queue race prevention — token-keyed dedupe + rotation + rollback on persist fail
  4. 4-gate request-loop termination — attempted.size, outbound budget, MAX_SHORT_RETRY_ATTEMPTS=3, MAX_STREAM_FAILOVERS=1
  5. Atomic writes for primary + flagged + settings — extend, don't regress
  6. Clean supply chain — audit:ci green, vendor:verify green
  7. OSS governance — SECURITY/COC/CONTRIBUTING/LICENSE + CodeQL + scanners
  8. CLI command taxonomy — Start/Daily/Repair/Advanced organization in README

Appendix: Evidence Index

Under docs/audits/evidence/:

FileSourcePurpose
context.txtT0bHEAD SHA, version, node, OS, CI workflows
inventory.txtT1LOC/file inventory per module
typecheck.txtT2npm run typecheck output (exit 0)
test-summary.txtT3Vitest summary (225 files/3418 tests/3 fail) + hermeticity verdict
lint.txtT4npm run lint (exit 0)
audit-ci.txtT4npm run audit:ci (exit 0)
vendor-verify.txtT4npm run vendor:verify (exit 0)
pack-check.txtT4npm run pack:check (exit 1) — budget violation
clean-repo-check.txtT4Hygiene check + 6 tmp files flagged
git-forensics.txtT5Churn, blame, regression commits
docs-claims.txtT6Docs claim inventory for drift cross-ref
redaction-report.txtT7Redaction validation grid (PASS)
dim-C-auth.mdT8Auth/OAuth/Token dimension (13 findings)
dim-D-routing.mdT9Multi-account routing dimension (9 findings)
dim-E-storage.mdT10Storage dimension (9 findings)
dim-F-config.mdT11 AtlasConfig/settings/dual-linter (7 findings)
dim-G-cli.mdT13 AtlasCLI/settings-hub (9 findings)
dim-H-request.mdT12 Atlas salvageRequest pipeline/SSE/resilience (10 findings)
dim-I-types.mdT14Type safety sweep (6 findings)
dim-JN-errors-health.mdT15 AtlasError handling + code health (10 findings)
dim-K-tests.mdT16 AtlasTest strategy + hermeticity (10 findings)
dim-LM-release-docs.mdT17 AtlasRelease/CI + docs drift (12 findings)
dim-P-perf.mdT17b AtlasPerf lightweight (6 findings)

Total distinct findings: ~110 across 16 dimensions.


End of MASTER_AUDIT.md — composed under Sisyphus/Atlas workflow. See Section 3 for strengths to preserve. See Section 14 for the 4-phase roadmap.