Changelog

January 15, 2026 · View on GitHub

All notable changes to the Solokit (Session-Driven Development) project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • Meaningful Test Guidance in Documentation
    • Added explicit guidance that tests should represent real use cases, not just satisfy coverage metrics
    • Updated PRD_WRITING_GUIDE.md with "Write Meaningful Tests, Not Just Coverage" section
    • Updated writing-specs.md with new tip on scenario-driven vs coverage-driven tests
    • Key principle: "If this feature broke in production, would this test catch it?"
    • Helps prevent false confidence from tests that pass but don't catch real bugs

[0.3.0] - 2026-01-14

Added

  • Minimal Init Mode (sk init --minimal) (PR #207)
    • New --minimal flag for lightweight project initialization
    • Installs only session tracking infrastructure without templates or quality tiers
    • Ideal for simple projects (HTML sites, scripts, prototypes) that don't need testing/linting
    • Creates: .session/ structure, guides, Claude Code slash commands, CLAUDE.md, README.md, CHANGELOG.md
    • Quality gates disabled by default in minimal mode config
    • Git repository and GitHub setup still available
    • Comprehensive test coverage: 28 unit tests, 11 integration tests

Fixed

  • /end Command Fails on New Projects with Few Commits

    • Changed git log --oneline HEAD~10..HEAD to git log --oneline -10 in end.md command
    • The previous syntax fails when a repository has fewer than 10 commits
    • Affects all solokit-initialized projects when running /end early in development
  • Semgrep CI Installation Failure in ml_ai_fastapi Template

    • Replaced unpinned pip install semgrep with official returntocorp/semgrep-action@v1 GitHub Action
    • Fixes CI security workflow failures caused by semgrep dependency resolution cascade
    • The official action handles installation complexity and provides native GitHub integration
    • Affects projects using ml_ai_fastapi template with CI/CD option enabled

[0.2.2] - 2025-12-05

Fixed

  • Missing test_execution.commands in Generated Config (PR #193)

    • Added commands configuration to test_execution in session_structure.py
    • Previously, sk validate would show ✗ tests: None because no test command was configured
    • Now includes default commands for Python (pytest), JavaScript, and TypeScript (npm test)
    • Affects all projects initialized with sk init or sk adopt
  • Validation Message Showing "None" Instead of Error

    • Fixed display bug where validation used dict.get() incorrectly
    • Now properly shows "Tests fail" message instead of "None"

Changed

  • Updated config.schema.json with commands property for test_execution
  • Updated test fixtures and documentation with complete configuration examples

[0.2.1] - 2025-12-04

Security

  • Critical CVE Patches for Next.js and React Templates (PR #191)
    • Updated Next.js from 16.0.1 to 16.0.7 to address CVE-2025-66478 (CVSS 10.0)
    • Updated React from 19.2.0 to 19.2.1 to address CVE-2025-55182
    • Updated react-dom from 19.2.0 to 19.2.1
    • Updated eslint-config-next from 16.0.1 to 16.0.7
    • Updated @next/bundle-analyzer from 16.0.1 to 16.0.7
    • Affects all Next.js templates: fullstack_nextjs, saas_t3, dashboard_refine (all tiers)
    • Reference: https://nextjs.org/blog/CVE-2025-66478

[0.2.0] - 2025-12-03

Fixed

  • Tailwind CSS v4 Migration for All Templates (fullstack_nextjs, saas_t3, dashboard_refine)

    • Migrated from Tailwind v3 @tailwind directives to v4 @import "tailwindcss" syntax
    • Moved theme configuration from JavaScript tailwind.config.ts to CSS @theme blocks
    • Updated components to use theme tokens (bg-background, text-foreground) instead of hardcoded colors
    • Fixed accessibility violation: Changed text-blue-500 to text-blue-700 for WCAG AA contrast compliance
    • Updated all test files to match new theme-based styling expectations
    • Added comprehensive Tailwind v4 documentation to ARCHITECTURE.md files
  • CHANGELOG Update Check False Positives in /end Command

    • Changed check from HEAD~10 (last 10 commits) to main...HEAD (commits since branch creation)
    • Previous logic gave false positives when a prior session had updated CHANGELOG
    • Now correctly detects whether CHANGELOG was updated in the current session/branch

[0.1.7] - 2025-11-28

Changed

  • Improved /end Command Flow (PR #184)

    • Streamlined /end command with clear 5-step process: pre-flight checks, completion status, run sk end, create PR, show results
    • sk end no longer attempts commits - Claude handles git commits, sk end just verifies and pushes
    • Better error messages: Clear guidance when no commits found on branch
    • Removed confusing two-option learning approach (commit tags vs temp file)
  • Slash Command Format for User-Facing Suggestions (PR #186)

    • Updated all user-facing command suggestions to use slash command format (/start, /end, /validate) instead of CLI format (sk start, sk end)
    • Updated 30 files including CLAUDE.md templates, Python CLI output, and guides
    • Principle: User-facing suggestions use / slash format, execution examples use sk CLI format

Fixed

  • CHANGELOG.md Not Being Copied During sk init (PR #184)

    • CHANGELOG.md template now properly copied to project root during initialization
    • Won't overwrite existing CHANGELOG.md files
  • Git Commit Double-Wrapped Error Messages (PR #184)

    • Fixed "Commit failed: Commit failed:" double-wrapped error messages
  • CI Template Permissions and Smoke Test Endpoint (PR #185)

    • Added pull-requests: read permission to secrets-scan job in security.yml (required for gitleaks-action)
    • Updated smoke-test in test.yml to test / instead of /api/health for Node.js templates
    • Smoke tests no longer require database connectivity

Added

  • PRD and STACK Guide References in Templates (PR #183)
    • Added "Writing PRDs" subsection under Claude Behavior Guidelines in all CLAUDE.md templates
    • Updated "Reference Documentation" section with mandatory PRD guide reference
    • Updated "Key Files" table to include PRD_WRITING_GUIDE.md and STACK_GUIDE.md

[0.1.6] - 2025-11-27

Changed

  • Minimal Scaffolding Migration: Phase 4 Documentation & Cleanup

    • Updated main README.md:
      • Changed "production-ready templates" messaging to "minimal scaffolding templates"
      • Added PRD-driven development workflow
      • Updated template descriptions to emphasize documentation over example code
      • Updated test counts to 3,802 tests
      • Added Development Guides section for .session/guides/
    • Updated template-registry.json:
      • All template descriptions now say "Minimal scaffolding for..."
      • Category descriptions updated to reflect minimal scaffolding approach
    • Documentation consistency fixes across all 4 stacks:
      • Added "These are files you will CREATE" clarifications to ARCHITECTURE.md examples
      • Added "target structure you will build" notes to STACK_GUIDE.md file organization sections
      • Fixed path inconsistencies in CLAUDE.md.template files
      • Updated test counts in docs/project/ROADMAP.md and tests/e2e/README.md
    • All 3,802 tests pass
  • Minimal Scaffolding Migration: dashboard_refine (Phase 3.4)

    • Removed example code (~977 lines) from dashboard_refine template:
      • Removed app/(dashboard)/users/ directory (95 + 122 lines) - Example users list page and tests
      • Removed components/forms/user-form.tsx and tests (76 + 206 lines) - Example form component
      • Removed lib/validations.ts and tests (52 + 225 lines) - Example validation schemas
      • Removed tier-1-essential/tests/unit/example.test.tsx (77 lines) - Example tests
      • Removed tier-3-comprehensive/tests/e2e/user-management.spec.ts (124 lines) - User E2E tests
      • Removed tier-3-comprehensive/tests/integration/dashboard.test.ts (28 lines) - Example integration test
    • Updated lib/refine.tsx:
      • Replaced mock data provider with placeholder that throws helpful errors
      • Cleared example resources array
      • Added comprehensive documentation for data provider setup
    • Updated components/layout/sidebar.tsx: Simplified to Dashboard route only
    • Updated app/(dashboard)/page.tsx: Minimal welcome page with guidance cards
    • Updated app/(dashboard)/__tests__/page.test.tsx: Tests for minimal page
    • Updated lib/__tests__/refine.test.tsx: Tests for placeholder data provider
    • Updated components/layout/__tests__/sidebar.test.tsx: Tests for minimal sidebar
    • Updated tier-3-comprehensive/tests/e2e/dashboard.spec.ts: Tests for minimal dashboard
    • Updated providers/__tests__/refine-provider.test.tsx: Fixed mock to use empty resources array
    • Created tier-3-comprehensive/tests/integration/api.test.ts: Placeholder integration test
    • Updated ARCHITECTURE.md:
      • Added "Building From Scratch" section with complete Refine patterns
      • Updated "Decision 2" to explain data provider requirement
      • Added comprehensive "Data Provider Options" section with REST, GraphQL, Supabase, Custom examples
      • Updated project structure to reflect minimal scaffolding
    • Updated CLAUDE.md.template:
      • Added "Building From Scratch" section with quick pattern reference
      • Updated file organization table with correct paths
      • Updated data provider warning to reflect placeholder behavior
    • Cleaned up empty directories (tier-1-essential/tests/unit/, tier-3-comprehensive/tests/integration/)
  • Documentation Consistency Fixes (All 4 stacks)

    • saas_t3 ARCHITECTURE.md: Moved "Building From Scratch" section from end to after Overview
    • ml_ai_fastapi CLAUDE.md.template: Reordered sections (Building From Scratch before Stack Architecture Rules)
    • dashboard_refine CLAUDE.md.template: Reordered sections (Building From Scratch before Stack Architecture Rules)
    • dashboard_refine docker/README.md: Fixed "mock data provider" → "placeholder data provider" terminology
    • fullstack_nextjs CLAUDE.md.template: Removed incorrect src/ prefix from file paths
    • fullstack_nextjs ARCHITECTURE.md: Fixed lib/validations/[feature].tslib/validations.ts
    • saas_t3 CLAUDE.md.template: Removed incorrect src/ prefix from file paths
  • Minimal Scaffolding Migration: saas_t3 (Phase 3.3)

    • Removed example code (~400 lines) from saas_t3 template:
      • Removed server/api/routers/example.ts (33 lines) - tRPC CRUD router
      • Removed server/api/routers/__tests__/example.test.ts (220 lines) - Router tests
      • Removed components/example-component.tsx (22 lines) - tRPC usage example
      • Removed components/__tests__/example-component.test.tsx (117 lines) - Component tests
      • Removed tier-1-essential/tests/unit/example.test.tsx (24 lines) - Example tests
    • Updated server/api/root.ts: Removed example router, kept commented example
    • Updated app/page.tsx: Minimal welcome page (no tRPC usage)
    • Updated app/__tests__/page.test.tsx: Tests for minimal page
    • Updated server/api/__tests__/root.test.ts: Removed example router references
    • Updated prisma/schema.prisma: Removed User model, kept commented example
    • Updated tier-3-comprehensive/tests/e2e/home.spec.ts: Tests for minimal page
    • Updated ARCHITECTURE.md:
      • Added "Building From Scratch" section with complete tRPC patterns
      • Added "Type Safety Flow" diagram showing Prisma → tRPC → React chain
      • Updated project structure to reflect minimal scaffolding
    • Updated CLAUDE.md.template:
      • Added "Building From Scratch" section
      • Updated code patterns to use generic posts example instead of example
    • Added .gitkeep to preserve empty components/ and routers/ directories
  • Minimal Scaffolding Migration: ml_ai_fastapi (Phase 3.2)

    • Removed example code (~430 lines) from ml_ai_fastapi template:
      • Removed src/api/routes/example.py (135 lines) - Full CRUD router
      • Removed src/models/example.py (61 lines) - SQLModel Item model
      • Removed src/services/example.py (115 lines) - Service layer example
      • Removed tests/unit/test_example.py (117 lines) - Example service tests
    • Updated main.py.template: Removed example router import and include
    • Updated models/__init__.py: Removed Item exports, added documentation example
    • Updated core/database.py: Removed Item import, added documentation example
    • Updated alembic/env.py: Removed Item import, added documentation example
    • Updated tests/unit/test_api_routes.py: Removed TestItemRoutes, kept TestHealthRoutes
    • Updated tests/integration/test_api.py: Removed TestItemAPIIntegration, kept TestHealthEndpoints
    • Updated locustfile.py: Removed Item API tests, kept health check tests only
    • Updated tier-4-production/src/core/monitoring.py: Removed example-specific counters
    • Updated ARCHITECTURE.md:
      • Added "Building From Scratch" section with step-by-step guide
      • Updated project structure to reflect minimal scaffolding
    • Updated CLAUDE.md.template:
      • Added "Building From Scratch" section
      • Added "Quick Pattern Reference" for adding new features
  • Minimal Scaffolding Migration: fullstack_nextjs (Phase 3.1)

    • Removed example code (~1,206 lines) from fullstack_nextjs template:
      • Removed app/api/example/ route and tests
      • Removed components/example-component.tsx and tests
      • Removed lib/validations.ts (example Zod schemas) and tests
      • Removed User model from Prisma schema (kept commented example)
    • Updated page.tsx to minimal welcome page (no example imports)
    • Updated ARCHITECTURE.md:
      • Added "Building From Scratch" section with step-by-step guide
      • Removed references to example files in project structure
    • Updated CLAUDE.md.template:
      • Added "Building From Scratch" section
      • Updated guidance to reference ARCHITECTURE.md instead of existing code
    • Added health check test at app/api/health/__tests__/route.test.ts
    • Updated E2E tests (flow.spec.ts) for minimal page
    • Added .gitkeep to preserve empty components/ directory

Added

  • Quality Gate Adjustments for Minimal Scaffolding (Phase 2 of Minimal Scaffolding Migration)

    • New scaffolding.py module with minimal scaffolding detection functions:
      • is_minimal_scaffolding(): Detects if project has only health check code
      • has_integration_test_files(): Checks for integration test files
      • has_e2e_test_files(): Checks for E2E test files (Playwright/Cypress)
    • Integration tests now skip gracefully when no integration test files exist
    • Integration tests handle missing spec files by skipping instead of failing
    • Added 23 new tests for scaffolding module
    • Added test for new integration checker behavior
    • CI/CD workflows already use --if-present for conditional E2E/integration tests
    • Coverage threshold only enforced when coverage report exists
    • Tier configs use glob patterns that gracefully handle empty directories
  • PRD Writing Guide and Stack Selection Guide

    • New STACK_GUIDE.md: Comprehensive guide for choosing between the 4 Solokit stacks
      • Quick decision tree for stack selection
      • Detailed comparison matrix (type safety, API style, learning curve, etc.)
      • Per-stack profiles with best-for/not-ideal-for guidance
      • When-to-switch guidance
    • New PRD_WRITING_GUIDE.md: Complete PRD writing guide for Claude-driven development
      • Rewritten for AI-assisted workflow (Claude writes PRD and implements code)
      • Vertical slices philosophy and INVEST principles
      • Technical constraints section with stack selection
      • Definition of Ready (DoR) checklist for Claude
      • Mapping PRD stories to Solokit work items
      • Claude-optimized PRD template
    • Guides automatically copied to .session/guides/ during sk init
    • Updated post-init message to reference guides and recommend PRD workflow
    • New fixture tracking_template_files_with_guides for testing
    • Added 6 new tests for guide functionality (3,773 → 3,779 total tests)
  • Format/Lint Auto-Fix Before Initial Commit

    • New Step 19 in sk init workflow: runs format and lint auto-fix before initial commit
    • Fixes user-provided files (PRD.md, ROADMAP.md, etc.) that may have formatting issues
    • Node.js projects: runs npm run format (Prettier) and npm run lint:fix (ESLint)
    • Python projects: runs ruff format . and ruff check --fix .
    • Silent operation: only logs at debug level, non-blocking on failure
    • Init flow updated from 20 to 21 steps (GitHub setup moved to Step 21)
    • New module: src/solokit/init/format_lint_fixer.py
    • Added 18 new tests for format_lint_fixer module (3,755 → 3,773 total tests)

[0.1.5] - 2025-11-26

Added

  • GitHub Repository Setup Integration

    • New src/solokit/github/ module for post-init GitHub repository setup
    • Interactive prompts to create new repo or connect to existing one
    • Supports both gh CLI and manual remote configuration
    • Added check_git_installed() and check_gh_installed() to environment validator
    • Integrated as Step 20 in sk init workflow (after initial commit)
    • Added 45 new tests for GitHub setup module (3,710 → 3,755 total tests)
  • Safe Config Implementation for sk adopt

    • Added intelligent file categorization: NEVER_OVERWRITE, MERGE_IF_EXISTS, INSTALL_IF_MISSING
    • New backup system: all modified files backed up to .solokit-backup/<timestamp>/
    • Smart merge strategies for 7 file types: package.json, pyproject.toml, eslint.config.mjs, .prettierrc, .pre-commit-config.yaml, requirements.txt, .husky/pre-commit
    • Added --dry-run flag to preview changes without modifications
    • Improved warning message showing categorized file handling
    • Added 119 new tests for backup and merge modules (3,591 → 3,710 total tests)
    • Test breakdown: 3,452 unit + 178 integration + 80 e2e tests
    • New files: adopt/backup.py, adopt/merge_strategies.py
    • All quality checks passing: ruff, mypy, formatting, 97% coverage

Security

  • Fixed Sentry Security Vulnerability (GHSA-6465-jgvq-jhgp)

    • Upgraded @sentry/nextjs from 10.23.0 to 10.27.0 in all Next.js templates
    • Vulnerability: Sensitive headers leaked when sendDefaultPii is set to true
    • Affected versions: 10.11.0 - 10.26.0
    • Updated: stack-versions.yaml and all tier-4 package.json.tier4.template files
  • Fixed npm audit vulnerabilities

    • Added tmp: 0.2.5 override to fix @lhci/cli vulnerability
    • All Next.js templates now pass npm audit with 0 vulnerabilities

Fixed

  • TypeScript Type Conflicts in E2E Tests

    • Fixed AxeBuilder type conflict between @playwright/test and @axe-core/playwright
    • Applied as any cast workaround in all tier-3 e2e test files
    • Affected: saas_t3, dashboard_refine, fullstack_nextjs templates
  • Prettier Formatting Issues in Templates

    • Fixed ARCHITECTURE.md and CLAUDE.md.template formatting across all stacks
    • Fixed blank line handling in readme_generator.py and claude_md_generator.py
    • All templates now pass prettier --check
  • E2E Test Python Executable

    • Fixed test_core_session_workflow.py to use sys.executable instead of hardcoded python3

[0.1.4] - 2025-11-24

Added

  • Comprehensive Test Coverage Improvements

    • Increased overall test coverage from 93% to 96% (+3%)
    • Added 242 new tests (2,983 → 3,225 total tests)
    • Test breakdown: 2,980 unit + 165 integration + 80 e2e tests
    • Files improved to >95% coverage:
      • protocols.py: 65% → 100%
      • dependency_installer.py: 78% → 100%
      • readme_generator.py: 80% → 100%
      • git_context.py: 83% → 98%
      • environment_validator.py: 85% → 100%
      • initial_commit.py: 85% → 100%
      • env_generator.py: 86% → 100%
      • git_hooks_installer.py: 89% → 96%
      • template_installer.py: 89% → 100%
      • gitignore_updater.py: 90% → 99%
      • extractor.py: 85% → 96%
      • curator.py: 86% → 97%
      • archiver.py: 90% → 100%
      • documentation_loader.py: 90% → 100%
      • spec_validator.py: 86% → 100%
      • updater.py: 86% → 98%
      • repository.py: 91% → 100%
      • tree.py: 85% → 98%
      • integration_runner.py: 86% → 100%
      • performance.py: 87% → 99%
      • formatter.py: 88% → 97%
      • query.py: 88% → 95%
      • quality/checkers/base.py: 89% → 100%
      • quality/reporters/base.py: 86% → 100%
    • Created 4 new test files:
      • tests/unit/quality/checkers/test_base.py
      • tests/unit/session/briefing/test_documentation_loader.py
      • tests/unit/session/briefing/test_formatter.py
      • tests/unit/session/briefing/test_git_context.py
    • All quality checks passing: ruff, mypy, formatting
  • Three-File Documentation Model for Project Initialization

    • Implemented comprehensive documentation structure with distinct purposes:
      • README.md: Quick start guide (generated, project-specific)
      • ARCHITECTURE.md: Technical documentation (static template, comprehensive)
      • CLAUDE.md: AI guidance for Claude Code (generated from template)
    • Created ARCHITECTURE.md files for all 4 stacks with comprehensive technical documentation:
      • Architecture decisions with rationale and trade-offs
      • Code patterns and examples
      • Project structure explanations
      • Database workflows
      • Troubleshooting guides
    • Created CLAUDE.md.template files for all 4 stacks with:
      • Stack-specific architecture rules and patterns
      • Comprehensive Solokit command usage guide
      • Claude behavior guidelines
      • Work item management instructions
      • Session workflow documentation
      • Learning capture best practices
      • Stack-specific anti-patterns and common mistakes
    • Added claude_md_generator.py module for CLAUDE.md generation
    • Integrated CLAUDE.md generation into sk init workflow (Step 6)
    • Updated orchestrator step numbering to be sequential (1-19)
    • Affects: All stacks (saas_t3, ml_ai_fastapi, dashboard_refine, fullstack_nextjs)

Changed

  • README Generator Improvements

    • Implemented cumulative quality gates (each tier includes all previous tiers' requirements)
    • Added stack-aware quality gates from template registry:
      • JavaScript stacks: E2E tests (Playwright), Bundle analysis, Lighthouse CI
      • Python stacks: Load testing (Locust), API documentation (OpenAPI), Performance profiling
    • Fixed uvicorn command for Python stack: uvicorn src.main:app --reload (was main:app)
    • Added environment setup section with .env.local instructions
    • Added database setup section (Prisma for npm stacks, Alembic for Python stacks)
    • Improved additional options display using registry names and descriptions
    • Added ARCHITECTURE.md reference section to all generated READMEs
    • Affects: All stacks
  • Template Registry Enhancements

    • Made quality gates stack-aware with adds_js and adds_python fields
    • Tier 3 quality gates now separate JavaScript-specific (Playwright) from Python-specific (Locust)
    • Tier 4 quality gates now separate JavaScript-specific (Bundle analysis, Lighthouse) from Python-specific (OpenAPI, Performance profiling)
    • Removed stack_specific field (replaced with cleaner adds_js/adds_python structure)
    • Affects: template-registry.json

Fixed

  • CI Workflow Improvements for GitHub Actions

    • Fixed Lighthouse CI Chrome sandbox issues on GitHub Actions runners
      • Added puppeteerLaunchArgs with --no-sandbox, --disable-dev-shm-usage, --disable-gpu flags
      • Simplified lighthouse npm script to just lhci autorun
      • Added Playwright browser installation step to lighthouse workflow
    • Fixed Gitleaks failing on initial commit (no parent to compare against)
      • Added condition to skip secrets-scan on initial push events
      • PRs always run secrets-scan correctly
    • Fixed dependency-review-action failing when Dependency Graph not enabled
      • Added conditional check via GitHub API before running dependency review
      • Shows warning if Dependency Graph is not enabled instead of failing
    • Removed custom CodeQL workflow to avoid conflicts with GitHub's default setup
      • GitHub's default CodeQL setup is recommended (enable in repo Settings > Security)
    • Added /api/health endpoint to all Next.js stacks for smoke tests
    • Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine)
  • Python Stack (ml_ai_fastapi) CI Workflow Fixes

    • Upgraded FastAPI from 0.115.6 to 0.121.3 to resolve starlette version conflict
    • Added setuptools>=78.1.1 to security dependencies to fix pip-audit vulnerability
    • Fixed vulture false positives by renaming unused parameters to _logger and _method_name
    • Fixed Pyright missing imports by installing all deps (dev + prod) in quality-check workflow
    • Fixed detect-secrets to handle missing .secrets.baseline gracefully
    • Fixed pip-audit to run without --require-hashes (requirements.txt typically doesn't have hashes)
    • Added conditional Dependency Graph check for dependency-review-action
    • Fixed cosmic-ray installation for mutation testing workflow
    • Removed .secrets.baseline from .gitignore (should be tracked in git)
    • Affects: ml_ai_fastapi stack
  • Playwright Browser Installation and System Dependencies

    • Fixed Playwright browsers not launching on Linux due to missing system dependencies
    • Added automatic apt_pkg Python module fix for Ubuntu 22.04 (common symlink issue)
    • Added automatic sudo npx playwright install-deps execution on Linux during sk init
    • Browser binaries are now properly installed AND system dependencies are configured
    • Impact: E2E tests, A11y tests, and Lighthouse CI now work out-of-the-box on fresh Linux VMs
    • Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine) at tier-3+
  • Lighthouse CI Chrome Detection

    • Fixed "Chrome installation not found" error in Lighthouse CI on local/VM environments
    • Added scripts/lighthouse.sh wrapper that auto-detects Chrome/Chromium location
    • On GitHub Actions: Uses pre-installed Chrome at /usr/bin/google-chrome-stable
    • On local/VM: Falls back to Playwright's Chromium if system Chrome not found
    • No longer requires manual Chrome installation for local development
    • Affects: All Next.js stacks at tier-4-production
  • ESLint Configuration Deprecation Warning

    • Fixed ".eslintignore file is no longer supported" warning in ESLint 9+
    • Migrated all ignore patterns from .eslintignore to eslint.config.mjs ignores array
    • Removed deprecated .eslintignore files from all tier-1-essential templates
    • Added comprehensive ignore patterns: playwright-report, test-results, .stryker-tmp, .lighthouseci, etc.
    • Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine)
  • Template File Formatting

    • Fixed Prettier formatting issues in template files
    • Re-formatted all template files using project's .prettierrc config (printWidth: 100)
    • Previously formatted with default Prettier settings causing format check failures
    • Affects: All stacks
  • Test Script Lighthouse CI Support

    • Added lighthouse.yml workflow parsing to test_all_templates.py
    • Lighthouse CI checks now run for tier-4-production projects regardless of ci_cd option
    • Fixed early return condition that skipped workflow checks when ci_cd option not selected

Added

  • README Documentation for E2E, A11y, and Lighthouse

    • Added Accessibility Testing section to README for projects with a11y option
    • Added Lighthouse CI section to README for tier-4-production projects
    • Documents that Playwright's Chromium is used automatically for Lighthouse
  • New User VM Test Guide

    • Added comprehensive testing guide: analysis-docs/NEW_USER_VM_TEST_GUIDE.md
    • Step-by-step instructions for testing Solokit on fresh GCP VMs
    • Automated test script: scripts/test-new-user-experience.sh
    • Tests all 4 stacks with tier-4 and all options
  • Phase 4 Test Failures for Next.js Stacks

    • Fixed fullstack_nextjs mutation test failures by creating tier-specific Jest environment configurations
    • Added tier-3 test file overrides using @stryker-mutator/jest-runner/jest-env/node for API route tests
    • Kept tier-1/tier-2 test files with standard @jest-environment node for compatibility
    • Fixed Lighthouse CI workflow placement: moved from a11y option to tier-4-production (where script exists)
    • Created dedicated lighthouse.yml workflows in tier-4-production for all Next.js stacks
    • Added PORT environment variable support to Playwright configs for parallel test execution
    • Updated test script to provide DATABASE_URL environment variable for Next.js dev servers
    • Optimized test timeouts: mutation tests (600s), regular tests (300s), default (120s)
    • Impact: All 192 phase-4 tests now passing across all stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi)
    • Affects: fullstack_nextjs, saas_t3, dashboard_refine (all tiers with ci_cd option)
    • Files modified:
      • Template files: 14 files (playwright configs, test files, workflow files)
      • Test infrastructure: test_all_templates.py (environment and timeout configuration)
  • Tier-Aware CI/CD Workflows for ml_ai_fastapi

    • Made CI/CD workflows respect tier-based tool availability to prevent false failures
    • Added conditional execution for pylint duplicate code check (tier-3+ only) using if: hashFiles('.pylintrc') != ''
    • Added conditional execution for cosmic-ray mutation tests (tier-3+ only) via config detection step
    • Added conditional execution for Bandit security linting (tier-2+ only) via config detection step
    • Updated test script to evaluate GitHub Actions hashFiles() expressions and skip steps with failing conditions
    • Updated test script to skip conditional check steps (e.g., "Check if cosmic-ray config exists")
    • Added --stack option to test script for running all phase-4 tests for a specific stack (48 tests per stack)
    • Impact: Prevents tier-1 and tier-2 tests from failing due to missing tier-3 quality tools
    • Affects: ml_ai_fastapi (all tiers with ci_cd option)
    • Fixed 24 out of 48 ml_ai_fastapi phase-4 test failures

Added

  • Code Duplication Detection for Python Stack (Session 4)

    • Added pylint 3.3.3 with duplicate-code checking to ml_ai_fastapi tier-3 dependencies
    • Created .pylintrc configuration for code duplication thresholds
    • Added duplication check step to ml_ai_fastapi quality-check.yml workflow
    • Updated stack-versions.yaml with pylint version and installation command
    • Updated test script to recognize and run pylint commands
    • Impact: All 4 stacks now have consistent code duplication detection at tier-3+
    • JavaScript stacks: jscpd, Python stack: pylint
    • Resolves: Session 4 (Code Duplication Detection) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Type Coverage Enforcement for Python Stack (Session 5)

    • Added mypy type coverage check to ml_ai_fastapi quality-check.yml workflow
    • Configured mypy with --disallow-untyped-defs and --disallow-incomplete-defs flags
    • Updated test script to recognize and run mypy commands
    • Impact: All 4 stacks now enforce type coverage at tier-3+
    • JavaScript stacks: type-coverage tool (95%), Python stack: mypy strict checking
    • Resolves: Session 5 (Type Coverage Enforcement) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Comprehensive Unit Tests for Python Stack (Session 6)

    • Added 29 comprehensive unit tests achieving 94.54% coverage (up from 75.63%)
    • Created test_api_routes.py: Tests for API endpoints and health checks with error scenarios
    • Created test_database.py: Tests for database connections and dependency injection with mocking
    • Created test_main.py: Tests for application startup, lifespan, and API documentation endpoints
    • Coverage breakdown: Dependencies (100%), Database (100%), Main app (100%), Models (100%), Services (100%)
    • Impact: Any coverage threshold (60%, 80%, 90%) selected during initialization will now pass
    • Resolves: Session 6 (Unit Tests with Coverage) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Integration Tests Verification (Session 7)

    • Verified all 4 stacks have integration tests properly configured in tier-3-comprehensive
    • Confirmed integration test scripts in package.json/pytest for all stacks
    • Verified test.yml workflows run integration tests without continue-on-error flags
    • JavaScript stacks: Jest/Vitest with integration test directories
    • Python stack: pytest with real HTTP client integration tests
    • Impact: All stacks at tier-3+ have working integration tests in CI
    • Resolves: Session 7 (Integration Tests) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md

Fixed

  • E2E Tests Failing in fullstack_nextjs (Session 8)

    • Fixed PrismaClientInitializationError during E2E tests caused by database queries in Server Components
    • Modified app/page.tsx to gracefully handle database connection errors with try-catch
    • Added fallback data for E2E test environment when database is unavailable
    • Fixed TypeScript type errors in lib/tests/prisma.test.ts using Object.defineProperty
    • Impact: E2E tests now pass without requiring database setup, matching saas_t3/dashboard_refine patterns
    • Root cause: fullstack_nextjs uses Server Components with direct Prisma queries (unlike other stacks)
    • Affects: fullstack_nextjs template, all tiers
    • Resolves: Session 8 (E2E Tests) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Mutation Testing Configuration Consistency (Session 9)

    • Fixed ml_ai_fastapi mutation testing to match JavaScript stack tiering pattern
    • Replaced mutmut with Cosmic Ray 8.4.3 (mutmut incompatible with src/ directory layouts)
    • Removed [cosmic-ray] configuration from base, tier-1, and tier-2 templates
    • Updated tier-3 and tier-4 pyproject.toml to use cosmic-ray==8.4.3 in quality dependencies
    • Deleted obsolete mutmut_config.py.template from tier-3-comprehensive
    • Updated CI/CD workflow to use cosmic-ray commands (cosmic-ray init/exec/cr-report)
    • Updated stack-versions.yaml with cosmic-ray 8.4.3
    • Verified cosmic-ray session creation (200 mutation jobs) on test projects
    • Added tests pattern to Jest testMatch in 6 JavaScript configs for better test discovery
    • Enhanced saas_t3 test coverage to 92.85% function coverage
    • Impact: Mutation testing now introduced in tier-3, inherited by tier-4 (consistent across all stacks)
    • Affects: ml_ai_fastapi (all tiers), saas_t3/dashboard_refine/fullstack_nextjs (tier-3/tier-4)
    • Resolves: Session 9 (Mutation Testing) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Production Build Quality Gate (Session 10)

    • Added explicit production build step to quality-check.yml for all 3 Next.js stacks
    • Production build now runs as final quality gate before PR merge
    • Updated test script to handle build steps context-aware (skip in setup jobs, run in quality jobs)
    • Enhanced npm command parsing to handle all npm commands (not just npm run)
    • Impact: Build failures now caught during quality checks, preventing broken production builds
    • Affects: saas_t3, dashboard_refine, fullstack_nextjs (all tiers with CI/CD option)
    • Resolves: Session 10 (Production Build) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Bundle Analysis Integration (Session 11)

    • Integrated @next/bundle-analyzer in next.config.ts for all tier-4 Next.js stacks
    • Bundle analyzer enabled via ANALYZE=true environment variable
    • Bundle analysis job added to build.yml workflow (uploads artifacts for 30 days)
    • Updated test script to parse security.yml and build.yml workflows
    • Impact: All tier-4 production templates now have bundle size monitoring in CI/CD
    • Affects: saas_t3/tier-4-production, dashboard_refine/tier-4-production, fullstack_nextjs/tier-4-production
    • Resolves: Session 11 (Bundle Analysis) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Security Scanning Enforcement (Session 12)

    • Removed continue-on-error flags from all security checks across all 4 stacks
    • JavaScript stacks: npm audit and dependency-review-action now fail CI on vulnerabilities
    • Python stack: Bandit, pip-audit, and Semgrep now fail CI on security issues
    • Fixed .bandit configuration syntax from INI/Python hybrid to proper YAML format
    • Added .eslintignore files to all JavaScript stacks to exclude generated report files
    • Updated .prettierignore to include report/ directory in all JavaScript stacks
    • Removed duplicate quality checks (type check, lint) from build.yml in all Next.js stacks
    • Added security.yml workflow to test script parsing
    • Impact: Security vulnerabilities now block CI/CD pipeline instead of being warnings
    • Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), all tiers with CI/CD
    • Resolves: Session 12 (Security Scanning) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Accessibility Testing Infrastructure (Session 13)

    • Tagged all accessibility tests with @a11y marker for proper test discovery
    • saas_t3: 1 test tagged (home.spec.ts)
    • fullstack_nextjs: 1 test tagged (flow.spec.ts)
    • dashboard_refine: 3 tests tagged (dashboard.spec.ts, user-management.spec.ts)
    • npm run test:a11y now correctly finds and runs all 5 accessibility tests
    • Tests use @axe-core/playwright to scan for WCAG 2.0/2.1 Level A & AA violations
    • Impact: Accessibility testing workflow now functional for all Next.js stacks
    • Affects: saas_t3, dashboard_refine, fullstack_nextjs (tier-3+, a11y option)
    • Resolves: Session 13 (Accessibility Testing) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Lighthouse CI Configuration & Build Workflow Optimization (Session 14)

    • Removed overly strict "lighthouse:recommended" preset from Lighthouse configuration
    • Now uses explicit assertions for meaningful metrics: 90% category scores + Core Web Vitals
    • Increased LCP threshold for dashboard_refine to 3500ms (accounts for Refine framework overhead)
    • Removed duplicate build job from build.yml workflow in all Next.js stacks
    • Simplified build.yml to only bundle-analysis job (builds independently with ANALYZE=true)
    • Added Prisma client generation step to quality-check.yml for saas_t3 and fullstack_nextjs
    • Production build now solely in quality-check.yml as proper quality gate
    • Impact: Cleaner CI/CD workflows, no redundant builds, Lighthouse focuses on practical metrics
    • Affects: saas_t3, dashboard_refine, fullstack_nextjs (tier-4-production, a11y option)
    • Resolves: Session 14 (Lighthouse CI) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Complexity Analysis Enforcement (Session 15)

    • Added ESLint complexity rules to all JavaScript stack tier-3 and tier-4 configs
    • JavaScript complexity rules: cyclomatic (max 10), max-depth (4), max-nested-callbacks (4), max-lines-per-function (100)
    • Test files exempted from max-nested-callbacks and max-lines-per-function (describe/it blocks naturally nest deeply)
    • Updated Python Radon check to enforce thresholds: radon cc --max B fails build if complexity exceeds grade B
    • ESLint handles both linting and complexity for JavaScript (industry standard approach)
    • Radon provides separate complexity analysis for Python (distinct from ruff linting)
    • Impact: All stacks at tier-3+ now enforce code complexity standards to maintain readability
    • JavaScript: Complexity violations show as ESLint errors during npm run lint
    • Python: Complexity violations fail radon cc step in quality-check.yml workflow
    • Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), tier-3+
    • Resolves: Session 15 (Complexity Analysis) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Dead Code Detection (Session 16)

    • Added ts-prune check to all JavaScript stack quality-check.yml workflows
    • JavaScript: npm run check:unused runs ts-prune to detect unused exports
    • ts-prune already installed in tier-3 package.json (version 0.10.3) with check:unused script
    • Updated Python Vulture configuration to reduce false positives
    • Increased Vulture min-confidence from 80 to 90 (fewer false positives)
    • Added explicit excludes for tests, pycache, and alembic/versions
    • Added ignore-names for common protocol parameters (method_name, logger) required by frameworks
    • Fixed test script to use shlex.split() instead of str.split() for proper handling of quoted arguments
    • Impact: All stacks at tier-3+ now detect and prevent dead/unused code
    • JavaScript: ts-prune detects unused exports during quality checks
    • Python: Vulture detects unused code in src/ with 90% confidence threshold
    • Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), tier-3+
    • Resolves: Session 16 (Dead Code Detection) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Template Registry Documentation Update (Session 18)

    • Updated template-registry.json to accurately reflect all implemented quality checks
    • Tier-3 comprehensive now documents all tools: ts-prune, jscpd, vulture, pylint, Radon, cosmic-ray
    • Tier-4 production now includes: Bundle analysis, Lighthouse CI, structured logging
    • Added stack_specific sections to clarify JavaScript vs Python tooling differences
    • Updated tier-3 to specify "E2E tests (Playwright for JS stacks)" - Python stack uses integration tests
    • Updated tier-4 to document actual features: Bundle analysis (@next/bundle-analyzer), Lighthouse CI (90% scores)
    • Changed tier-4 description from "Operations + Deployment" to "Operations + Monitoring + Performance"
    • Updated metadata last_updated date to 2025-11-19
    • Impact: Documentation now matches implementation, users know exactly what tools are used at each tier
    • Affects: template-registry.json (user-facing documentation)
    • Resolves: Session 18 (Update Template Registry Documentation) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md

Fixed (from previous sessions)

  • Template Type Check Failures Across All Stacks

    • Fixed dashboard_refine: Refine v5 Pagination API changed from current to currentPage
      • Updated lib/__tests__/refine.test.tsx to use correct Pagination interface
    • Fixed fullstack_nextjs: TypeScript read-only property errors in test environment setup
      • Updated lib/__tests__/prisma.test.ts to use type assertions for process.env.NODE_ENV
    • Fixed ml_ai_fastapi: Pyright unable to resolve imports (missing venv configuration)
      • Added venvPath and venv settings to pyrightconfig.json
    • Fixed import ordering issues in ml_ai_fastapi template files (13 files)
    • Impact: All CI/CD type checks now pass for tier-4-production across all stacks
    • Affects: All templates (dashboard_refine, fullstack_nextjs, saas_t3, ml_ai_fastapi), all tiers
    • Resolves: Session 2 (Type Checking) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
  • Pre-commit Option Conflict with Tier-2+ Templates

    • Removed redundant "Pre-commit" additional option from initialization
    • Tier-2+ templates already include Husky git hooks built-in (.husky/pre-commit)
    • Python pre-commit framework option was creating duplicate hook systems
    • Deleted all template pre-commit/ directories (saas_t3, ml_ai_fastapi, fullstack_nextjs, dashboard_refine)
    • Updated init.py to show only 3 additional options: CI/CD, Docker, Env Templates
    • Updated template-registry.json and template_installer.py
    • Updated all documentation and command files to reflect the change
    • Updated all test files to remove pre_commit from test cases
    • Impact: Cleaner user experience, no conflicting hook systems, tier-1 users can upgrade to tier-2 for git hooks
    • Affects: All templates, all tiers
    • Rationale: Tier-2+ already has Husky (90% of users), tier-1 users can upgrade to tier-2, advanced users can manually install Python pre-commit if needed
  • Critical Tier-3 and Tier-4 Test Suite Failures

    • Fixed Jest configuration to exclude Playwright e2e tests from Jest runs (6 files)
      • Added testMatch to only run unit and integration tests
      • Added testPathIgnorePatterns to exclude /tests/e2e/
      • Added transformIgnorePatterns for ESM dependencies (superjson, @trpc)
      • Added moduleNameMapper for path aliases
      • Created dedicated jest.config.ts files for tier-3 and tier-4 templates
    • Fixed package.json test scripts to separate test types (6 files)
      • Added test:unit - Run unit tests only
      • Added test:integration - Run integration tests only
      • Added test:e2e - Run Playwright e2e tests only
      • Added test:all - Run all test types sequentially
      • Updated tier-3 and tier-4 package.json templates for all stacks
    • Replaced broken integration test examples with working placeholders (3 files)
      • Removed ESM server imports that caused "Cannot use import statement outside a module" errors
      • Added educational placeholder tests demonstrating proper structure
      • Tests now pass immediately after project initialization
    • Impact: All tier-3 and tier-4 projects now have working test suites out of the box
    • Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
    • Resolves: "TransformStream is not defined" and ESM import errors

Added

  • Urgent Flag for Single Immediate-Priority Work Items

    • Added --urgent flag to sk work-new command for marking work items that require immediate attention
    • Exclusive single-item constraint: only ONE work item can be urgent at a time
    • User confirmation prompt when setting a new urgent item (clears existing urgent flag)
    • Urgent items override all priority levels and dependency ordering
    • Visual ⚠️ indicator in sk work-list output for urgent items
    • sk work-next always returns urgent items first, ignoring dependencies
    • Added --clear-urgent flag to sk work-update command for manual clearing
    • Auto-clear urgent flag when work item status changes to completed
    • Added urgent status question to /work-new slash command (interactive UI)
    • Backward compatible: work items without urgent field default to false
    • Added 24 unit tests (repository, scheduler, updater) with 90%+ coverage
    • Added 11 integration tests for end-to-end urgent workflow
    • Updated command documentation: work-new.md, work-update.md, work-list.md, work-next.md
  • Essential CLI Commands: help, version, doctor, config show

    • Added sk help command to display all commands organized by category
      • sk help <command> shows detailed help for specific commands with usage, options, and examples
      • Global --help and -h flags supported
      • Created src/solokit/commands/help.py with comprehensive command documentation
    • Added sk version command to display version information
      • Shows Solokit version, Python version, and platform
      • Global --version and -V flags supported
      • Created src/solokit/commands/version.py
    • Added sk doctor command for comprehensive system diagnostics
      • Checks Python version (>= 3.11.0), git installation, project structure
      • Validates config.json and work_items.json integrity
      • Verifies quality tools availability (pytest, ruff)
      • Provides actionable suggestions for failed checks
      • Returns exit code 0 if all pass, 1 if any fail
      • Created src/solokit/commands/doctor.py
    • Added sk config show command to display configuration
      • Shows config file path and formatted configuration
      • Validates configuration and displays status
      • --json flag for machine-readable output
      • Created src/solokit/commands/config.py
    • Updated CLI routing in src/solokit/cli.py to support new commands
    • Running sk with no arguments now shows help (instead of error)
    • Added 27 new unit tests for all commands (100% passing)
    • Updated README.md with utility commands section
    • Updated docs/guides/troubleshooting.md to reference sk doctor as first troubleshooting step

Fixed

  • High: Urgent Flag Not Cleared on Session Completion

    • Fixed urgent flag persisting on completed work items when using sk end command
      • Session completion now uses WorkItemUpdater instead of direct JSON manipulation
      • Ensures urgent flag is automatically cleared when work item status changes to completed
      • Behavior now consistent with sk work-update <id> --status completed
      • Updated src/solokit/session/complete.py to use repository pattern for status updates
    • Added --set-urgent flag to sk work-update command for setting urgent status
      • Allows promoting existing work items to urgent status
      • Automatically clears urgent flag from other items (single-item constraint)
      • Complements existing --clear-urgent flag for complete CLI control
      • Updated src/solokit/work_items/updater.py with set_urgent field handling
    • Updated help documentation for urgent flags
      • Added --set-urgent and --clear-urgent to work-update command help
      • Added --urgent flag to work-new command help examples
      • Updated src/solokit/commands/help.py with complete option descriptions
      • Updated .claude/commands/work-update.md and template version
    • Added integration test for urgent flag clearing on session completion
      • New test: test_auto_clear_urgent_on_session_completion in test_urgent_workflow.py
      • Verifies end-to-end workflow with session completion
      • All 12 urgent workflow integration tests passing
    • Impact: Work lists now correctly show/hide ⚠️ symbol based on actual urgent status
    • Users no longer need manual cleanup after completing urgent work items
    • Complete CLI support for urgent flag lifecycle (create, set, clear, auto-clear)
  • Critical: Next.js 16 Template Initialization Issues

    • Fixed missing ts-node dependency causing Jest to fail parsing TypeScript config files
      • Added "ts-node": "10.9.2" to devDependencies in all 15 Next.js package.json templates
      • Affects all 3 Next.js templates (saas_t3, fullstack_nextjs, dashboard_refine) × 5 tiers each
      • Resolves error: Jest: 'ts-node' is required for the TypeScript configuration files
    • Fixed deprecated next lint command removed in Next.js 16
      • Changed "lint": "next lint" to "lint": "eslint ." in all 15 templates
      • Updated "lint:fix" script in dashboard_refine templates to use direct ESLint
      • Resolves cryptic error: Invalid project directory provided, no such directory: .../lint
    • Fixed ESLint 9 incompatibility with legacy config format
      • Replaced .eslintrc.json (legacy) with eslint.config.mjs (flat config) in all 3 templates
      • Added "globals": "16.5.0" package to tier-1-essential in all 3 templates
      • Configured proper globals for Node.js, browser, React, and Jest environments
      • Resolves error: ESLint couldn't find an eslint.config.(js|mjs|cjs) file
    • Fixed linting validation being skipped during quality gates check
      • Updated src/solokit/init/session_structure.py to include linting commands in quality gates config
      • Added commands section with language-specific linting commands (python, javascript, typescript)
      • Validation now properly runs npm run lint instead of reporting "no command for typescript"
    • Fixed linting errors in template example code
      • Removed unused ctx parameters from tRPC example router (saas_t3 template)
      • Template code now passes linting without errors after initialization
    • Impact: All 3 Next.js templates now work correctly across all quality tiers (base through tier-4)
    • Users can successfully initialize projects without manual workarounds
    • Quality gates validation (/validate, /end) now properly check linting instead of skipping
    • Linting works out-of-the-box with ESLint 9 flat config
    • All 2,936 tests passing with zero regressions
  • Critical: CI/CD Workflow Failures in Template Projects

    • Fixed CodeQL permission error causing Security workflow to fail on push to main
      • Added actions: read permission to CodeQL jobs in all template security.yml files
      • Resolves error: "Resource not accessible by integration" when accessing workflow metadata
      • Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
    • Fixed CodeQL and secrets-scan jobs running on pull requests without required permissions
      • Added if: github.event_name != 'pull_request' conditional to skip on PRs
      • These jobs require write permissions not available in PRs from forks
      • Prevents workflow failures on external contributions
      • Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
    • Fixed dependency-review failing on repositories without GitHub Advanced Security
      • Added continue-on-error: true to dependency-review step
      • Allows workflow to pass even when Advanced Security is not available (free repositories)
      • Resolves error: "Dependency review is not supported on this repository"
      • Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
    • Fixed Deploy workflow failures when production secrets are not configured
      • Added conditionals to skip deployment steps when secrets are empty/missing
      • Database migrations: if: ${{ secrets.DATABASE_URL != '' }}
      • Vercel deployment: if: ${{ secrets.VERCEL_TOKEN != '' }}
      • Sentry releases: if: ${{ secrets.SENTRY_AUTH_TOKEN != '' }}
      • Lighthouse CI: if: ${{ secrets.LHCI_GITHUB_APP_TOKEN != '' }}
      • Python templates: STAGING_DATABASE_URL, RAILWAY_TOKEN, DOCKER_REGISTRY, DEPLOY_KEY
      • Affects: saas_t3, fullstack_nextjs, dashboard_refine, ml_ai_fastapi templates
    • Fixed missing npm script errors in test and build workflows
      • Changed to npm run --if-present test:integration for integration tests
      • Changed to npm run --if-present test:e2e for E2E tests
      • Changed to npm run --if-present analyze for bundle analysis
      • Scripts gracefully skip if not defined in package.json (tier-1/tier-2 projects)
      • Resolves errors: "Missing script: test:integration/test:e2e/analyze"
      • Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
    • Impact: New projects can now merge PRs without CI failures
    • All CI workflows pass on tier-1-essential projects (base configuration)
    • Deploy workflows gracefully skip steps when production infrastructure isn't configured yet
    • Users can set up production secrets and advanced test suites incrementally without errors
    • Fixed 11 workflow files across 4 templates (security.yml, deploy.yml, test.yml, build.yml)
  • Critical: Phase 2 Terminal Testing - Final 11 UX Issues (All 18 Issues Now Complete)

    • Fixed .session/ directory causing uncommitted changes warnings (#9 - Critical)
      • Added .session/ to .gitignore in all 4 stack templates (saas_t3, ml_ai_fastapi, dashboard_refine, fullstack_nextjs)
      • Templates now properly exclude session tracking from git by default
    • Fixed DOT syntax error in work-graph SVG generation (#4/#5 - Critical)
      • Changed from invalid "bold, color=red" to valid DOT syntax 'style=bold, color=red'
      • Updated src/solokit/visualization/dependency_graph.py:169
      • SVG graph generation now works correctly with Graphviz
    • Changed uncommitted changes from ERROR to INFO level in sk start (#8 - High)
      • Updated src/solokit/session/briefing/git_context.py to handle WorkingDirNotCleanError gracefully
      • Users no longer see ERROR logs for normal uncommitted changes during development
    • Added progress messaging and Claude Code promotion to sk init (#1 - High)
      • Added initial progress message during initialization
      • Changed final messages to use output.info() instead of logger.info() for visibility
      • Updated src/solokit/init/orchestrator.py with user-facing completion summary
    • Added warning when dependency already exists in work-update (#2 - Medium)
      • Shows output.warning("Dependency 'X' already exists (skipped)") instead of silently skipping
      • Updated src/solokit/work_items/updater.py
    • Replaced verbose output with compact table format in work-next (#6 - Medium)
      • New table shows ID, Type, Priority, Status, Blocks, and Title columns
      • Displays top 5 ready items and top 3 blocked items
      • Arrow (→) marks recommended item, updated src/solokit/work_items/scheduler.py
    • Added interactive prompt to work-delete when no flags provided (#12 - Medium)
      • Users now get choices: 1=keep spec, 2=delete spec, 3=cancel
      • No longer requires --keep-spec or --delete-spec flags (but still accepts them)
      • Updated src/solokit/work_items/delete.py with user-friendly menu
    • Removed redundant ERROR/WARNING logs in edge cases (#14/#15/#16 - Medium)
      • Removed duplicate logging before user-facing error messages
      • Updated query.py, updater.py, and delete.py to avoid log duplication
      • Changed "No changes made" to "No changes to update" for clarity
    • Updated work-graph to use HelpfulArgumentParser for better errors (#17 - Medium)
      • Invalid format errors now show examples instead of raw argparse output
      • Updated src/solokit/visualization/dependency_graph.py
    • Improved "no results" message in learn-search (#11 - Low)
      • Now suggests trying different keywords or browsing all learnings
      • Updated src/solokit/learning/reporter.py
    • Added validation for empty query in learn-search (#18 - Low)
      • Shows error with examples when query is empty or whitespace-only
      • Updated src/solokit/learning/curator.py
    • Test updates: Fixed 1 test in test_briefing_generator.py to match new git status message
    • All 2,388 tests passing with zero regressions
    • Quality checks: All ruff linting passed, all formatting compliant, all mypy checks passed
    • Impact: Completes all 18 Phase 2 terminal testing issues for professional CLI UX
  • Critical: Phase 2 Terminal Testing - Clean Output, Archiver Fix, and Briefing Improvements

    • Fixed log leakage issue where INFO/WARNING/ERROR logs appeared in all commands without --verbose flag
      • Changed default CLI log level from INFO to ERROR for clean terminal output
      • Removed redundant logging configuration from validate.py
      • Updated src/solokit/cli.py to set ERROR level by default, DEBUG with --verbose
      • Only ERROR and above messages shown to users unless explicitly requesting verbose mode
    • Fixed archiver type comparison error causing learning curation to fail
      • Updated src/solokit/learning/archiver.py to handle new session dict format
      • Changed from comparing dict objects directly to extracting session_num field first
      • Resolves '>' not supported between instances of 'dict' and 'int' error
    • Fixed work-list count logic to include blocked items in not_started category
      • Updated src/solokit/work_items/query.py to count items by actual status
      • Blocked is now correctly treated as a property, not a separate status
      • Count math now accurate: total = in_progress + not_started + completed
    • Added template comment stripping to briefing output for cleaner specs
      • Created strip_template_comments() method in src/solokit/session/briefing/formatter.py
      • Removes HTML comments, placeholder text, and excessive blank lines from specs
      • Briefings now ~5x shorter and more readable without template cruft
    • Verified work-graph documentation already matches implementation (ascii, dot, svg formats)
    • Added comprehensive regression test suite: tests/integration/test_phase_2_terminal_fixes.py
      • 15 new tests covering all 5 issues
      • Updated 5 existing tests to use new session dict format
    • All 2,388 tests passing with zero regressions
    • Quality checks: All ruff linting passed, all formatting compliant, all mypy checks passed
    • Impact: Resolves 5 critical Phase 2 terminal testing issues for professional CLI UX
  • Critical: Phase 1 Terminal Testing - Error Messaging & UX Improvements

    • Fixed missing jsonschema>=4.20.0 dependency causing all learning commands to fail
    • Enhanced argparse error messages with helpful examples and next steps:
      • Created src/solokit/core/argparse_helpers.py with HelpfulArgumentParser class
      • Updated sk work-new, sk work-show, sk work-update, sk work-delete with example-rich epilogs
      • All argparse errors now show full help text with examples instead of raw usage
    • Improved Python binary detection for cross-platform compatibility:
      • Created src/solokit/core/system_utils.py with get_python_binary() function
      • Updated get_metadata.py, get_next_recommendations.py, get_dependencies.py to detect python vs python3
      • Error messages now show correct binary based on system availability
    • Added --debug flag to sk validate to hide stack traces from end users by default
    • Implemented context-aware "no work item" error messages:
      • sk start: Differentiates between "no items exist" vs "items exist but blocked"
      • sk status: Shows total item count and actionable next steps
      • sk end: Provides complete workflow guidance instead of "Work item not found: None"
      • sk work-next: Helpful creation steps instead of generic "No work items found."
      • sk work-list: Better message instead of wrong command reference "/work-item create"
      • sk work-graph: Context-aware message differentiating no items vs filtered results
    • All error messages now include:
      • Numbered action steps for both terminal (sk commands) and Claude Code (slash commands)
      • Emoji hints (⚠️, 💡) for visual guidance
      • Specific next steps instead of generic warnings
    • Test updates: Fixed 1 test in test_status.py to match improved error messages
    • All 2,155 unit tests passing
    • Impact: Resolves 13 out of 19 Phase 1 terminal testing issues

Added

  • Feature: UX Enhancements - Logger Shortening, Interactive Prompts, and Claude Code Promotion
    • Shortened logger names for better terminal readability (e.g., "orchestrator" vs "solokit.init.orchestrator")
    • Added questionary library for rich interactive CLI prompts with styled UI components
    • Created src/solokit/core/cli_prompts.py utility module with 4 reusable functions:
      • confirm_action(): Styled confirmation prompts with default fallback
      • select_from_list(): Single-select lists with arrow key navigation
      • multi_select_list(): Multi-select checkboxes for multiple options
      • text_input(): Text input with optional validation and defaults
    • Replaced basic input() calls in src/solokit/project/init.py with questionary prompts:
      • Template selection now uses interactive list selection
      • Quality tier selection with rich descriptions
      • Coverage target selection with visual list
      • Additional options use multi-select checkboxes
      • Final confirmation with styled yes/no prompt
    • Added Claude Code promotion to initialization completion:
      • Prominent messaging after sk init completes
      • Lists key slash commands (/start, /end, /work-new, /work-list)
      • Includes link to https://claude.com/claude-code
      • Better flow: Claude Code promotion → Next Steps
    • Enhanced README.md with Claude Code positioning:
      • Added "💡 Best Used with Claude Code" hero section with Quick Start variant
      • Enhanced Prerequisites to strongly recommend Claude Code (not just required)
      • Added "vs. Using Claude Code Standalone" comparison explaining workflow benefits
      • Repositioned documentation to emphasize Claude Code as primary interface
    • All prompts gracefully fall back to defaults in non-interactive environments (CI/CD, piped stdin)
    • Added EOF/KeyboardInterrupt error handling for robust test execution
    • Test suite: 2,373 tests passing (added 17 new tests for cli_prompts module)
    • Quality: All ruff linting passed, all mypy checks passed with modern type annotations

Fixed

  • Quality: Complete code quality and test suite cleanup
    • Fixed all linting issues: Replaced deprecated typing.List with built-in list type in 3 template files
    • Fixed all mypy type errors (17 errors across 6 files):
      • Updated pyproject.toml: Replaced deprecated strict_concatenate with extra_checks
      • Fixed exceptions.py: Changed implicit Optional returncode: int = None to explicit int | None = None
      • Added type casting in template_installer.py and dependency_installer.py for json.load() and yaml.safe_load() returns
      • Enhanced return type in environment_validator.py: dict[str, bool | str]dict[str, bool | str | None | list[str]]
      • Added Literal type casting in orchestrator.py for stack_type and tier parameters
    • Fixed test failures (3 tests):
      • Fixed mock fixtures: Changed exit_code to returncode in 6 test mocks
      • Updated conftest.py: Aligned mock_stack_versions with actual stack-versions.yaml structure (base, tier1-4 instead of all_tiers/tier4)
    • Removed all legacy init tests (12 tests deleted):
      • Deleted TestGitignoreGeneration class (8 tests) from test_init_workflow.py
      • Deleted TestGitInitialization class (3 tests) from test_init_workflow.py
      • Deleted TestCompleteInitWorkflow test (1 test) from test_init_workflow.py
    • Fixed E2E test fixtures to avoid legacy init (25 tests un-skipped):
      • Updated fixtures in test_core_session_workflow.py, test_learning_system.py, test_work_item_system.py
      • Fixtures now manually create .session directory structure instead of calling deprecated sk init
      • Added all required tracking files with proper structure (work_items.json, learnings.json, status_update.json, stack.txt, tree.txt)
    • Test suite results: 2,954 tests passing, 0 failed, 0 skipped (previously 2,368 passing, 35 skipped)
    • Quality checks: All ruff linting passed, all 247 files formatted, all mypy checks passed (106 source files)
    • Benefits: Clean codebase with modern Python type hints, zero legacy code, 100% test success rate

Added

  • Feature: Claude Code Interactive UI Integration
    • Integrated Claude Code's AskUserQuestion tool to replace Python's interactive terminal prompts with rich UI components
    • Updated 6 slash commands with interactive workflows:
      • /work-new: Interactive dependency and metadata selection with AI-powered suggestions
      • /work-update: Multi-select field updates (status, priority, milestone, dependencies)
      • /work-delete: Shows dependent work items with warning before deletion
      • /end: Work item completion status selection (completed/in-progress/cancel)
      • /learn: AI-generated learning suggestions with multi-select capture
      • /start: Interactive work item recommendations (top 4 ready items by priority)
    • Created 4 optimization scripts to avoid reading full JSON files:
      • get_metadata.py: Fast work item metadata retrieval (~10 lines vs 1,751 lines)
      • get_dependencies.py: Quick dependency lookup with filtering and status
      • get_dependents.py: Find work items that depend on a given item
      • get_next_recommendations.py: Get top N ready work items by priority
    • Removed all Python input() calls from command modules (creator.py, updater.py, delete.py, complete.py)
    • All commands now require explicit CLI arguments with no interactive fallbacks
    • Updated command files (.claude/commands/*.md) with declarative AskUserQuestion workflows
    • Added 53 comprehensive unit tests for optimization scripts
    • All 2,226 tests passing (1,996 unit + 140 integration + 90 e2e)
    • Full type safety maintained with mypy strict mode
    • Benefits: Rich interactive UI for Claude Code users, better UX with multi-select options, AI-generated suggestions, optimized performance

Changed

  • Session Completion: /sk:end now defaults to marking work items as completed
    • Non-interactive mode (e.g., when run by Claude Code) now defaults to marking work items as "completed" instead of "in-progress"
    • This aligns with the most common use case where developers end sessions after completing their work
    • Use the --incomplete flag explicitly to keep work items as "in-progress" for multi-session work
    • Interactive mode behavior unchanged (still defaults to completed as choice 1)
    • Updated src/solokit/session/complete.py:943 to return True in non-interactive mode
    • Updated documentation in .claude/commands/end.md to reflect new default behavior
    • Updated test test_prompt_non_interactive_defaults_true in tests/unit/session/test_complete.py

Added

  • Performance: Comprehensive optimization for session operations
    • Created src/solokit/core/cache.py with thread-safe TTL-based caching:
      • Cache class with get/set/invalidate/clear operations
      • FileCache class with automatic modification time tracking
      • Global cache instance accessible via get_cache()
    • Created src/solokit/core/performance.py for performance monitoring:
      • @measure_time() decorator for automatic function timing
      • Timer context manager for code block timing
      • Automatic logging for operations >100ms (info) and >1s (warning)
    • Enhanced src/solokit/learning/similarity.py with caching optimizations:
      • Added _word_cache to cache word sets during merge operations
      • Pre-compute word sets once per category (O(n) instead of O(n²))
      • Reduced similarity checking from 4,950 operations to ~100 for 100 learnings
    • Enhanced src/solokit/work_items/repository.py with file caching:
      • load_all() uses FileCache with modification tracking
      • Eliminates 11+ repeated file loads per operation
      • save_all() automatically invalidates cache
    • Added 91 comprehensive tests:
      • 16 cache module tests (TTL, thread safety, file caching)
      • 13 performance module tests (decorator, timer, exception handling)
      • Enhanced similarity tests with word cache validation
    • Performance improvements:
      • Similarity checking: 30-50x faster for large learning datasets
      • File I/O: 10x reduction with intelligent caching
      • Automatic performance monitoring built-in across codebase
    • All 1,980 unit tests passing, full type safety with mypy strict mode

Changed

  • Refactor: Extract constants and remove magic values - Complete centralization

    • Created comprehensive src/solokit/core/constants.py module with 31 constants organized into 9 categories
    • Replaced 50+ magic timeout values and hardcoded path strings across 27 files with named constants
    • Added 8 helper functions for type-safe path construction (e.g., get_session_dir(), get_work_items_file())
    • Organized constants into logical categories:
      • Git operation timeouts (3): Quick/Standard/Long (5s/10s/30s)
      • Quality gate timeouts (5): From 5s checks to 20min test runs
      • Integration testing timeouts (5): Docker, fixtures, cleanup operations
      • Session workflow timeouts (4): Status, completion, learning extraction
      • Project initialization timeouts (3): Stack detection, tree/graph generation
      • Performance testing (4): Regression thresholds, test timeouts
      • Learning system (5): Curator settings, similarity thresholds
      • Directory and file paths (11): Session directory structure
    • Updated files across all major modules:
      • Core: git/integration.py (13 replacements), session/validate.py
      • Quality: All 8 checker modules + gates.py (22 replacements)
      • Session: complete.py, status.py, briefing modules (8 replacements)
      • Testing: performance.py, integration_runner.py (9 replacements)
      • Other: learning, visualization, project modules (4 replacements)
    • All constants use Final type annotations for type safety
    • Benefits: Single source of truth, self-documenting code, easier maintenance, improved readability
    • All 2,180 tests passing, zero linting issues, clean formatting
  • Refactor: Complete logging consistency refactor - 100% migration to structured logging

    • Migrated all 502 print() statements across 30 files to new structured logging/output system
    • Separated user-facing output from diagnostic logging for better maintainability:
      • Created OutputHandler class in src/solokit/core/output.py for user-facing messages (stdout/stderr)
      • Enhanced logging_config.py with structured logging, JSON formatting, and context management
    • Migrated 21 additional files across 4 batches in Session 29:
      • Batch 1 (100 statements): reporter.py, dependency_graph.py, tree.py
      • Batch 2 (37 statements): config_validator.py, cli.py, error_formatter.py, stack.py
      • Batch 3 (24 statements): milestones.py, curator.py, repository.py, work_items stragglers
      • Batch 4 (38 statements): env_validator.py, executor.py, performance.py, exceptions.py, and 4 others
    • Fixed all migration issues:
      • Corrected indentation errors and incomplete f-strings from automated migration
      • Fixed variable shadowing bug in dependency_graph.py (output vs graph_output)
      • Added missing output = get_output() initialization in 8+ modules
      • Updated 45 tests to work with new output system instead of capturing stdout
    • All 2,180 tests passing (100% pass rate) after migration
    • Passed all quality gates: ruff linting, mypy type checking, code formatting
    • Benefits: Cleaner separation of concerns, consistent user experience, better diagnostic logging, structured log output support
  • Refactor: Decompose manager.py god-class into modular architecture

    • Decomposed monolithic 1,212-line WorkItemManager god-class into 8 focused, single-responsibility modules
    • Created 7 new specialized modules: repository.py, creator.py, validator.py, query.py, updater.py, scheduler.py, milestones.py
    • Refactored main manager.py from 1,212 to 260 lines (-79% reduction) by delegating to specialized modules
    • Implemented dependency injection pattern with clear module responsibilities:
      • WorkItemRepository: Data access and persistence layer (CRUD operations) (235 lines)
      • WorkItemCreator: Interactive and non-interactive work item creation with prompts (436 lines)
      • WorkItemValidator: Validation logic for integration tests and deployments (197 lines)
      • WorkItemQuery: Listing, filtering, searching, sorting, and display (389 lines)
      • WorkItemUpdater: Update operations with field validation (211 lines)
      • WorkItemScheduler: Work queue management and next item selection (176 lines)
      • MilestoneManager: Milestone CRUD operations and progress tracking (133 lines)
    • Created comprehensive test suite: 168 new unit tests for all new modules (213 tests total, up from 111)
    • Added 4 new test files: test_repository.py, test_creator.py, test_query.py, test_milestones.py
    • Updated test_manager.py to focus on integration testing of the orchestration layer (45 integration tests)
    • Fixed 4 mypy type annotation errors in repository.py for strict type checking compliance
    • All 2,165 tests passing (100% pass rate) including 213 work_items module tests
    • Maintained full backward compatibility with existing WorkItemManager public API
    • Benefits: Single responsibility principle, improved testability, better code navigation, extensibility, loose coupling, easier maintenance
  • Refactor: Decompose learning curator god-class into modular architecture

    • Decomposed monolithic 1,226-line LearningsCurator god-class into 8 focused, single-responsibility modules
    • Created 6 new specialized modules: categorizer.py, archiver.py, extractor.py, repository.py, reporter.py, validator.py
    • Refactored main curator.py from 1,226 to 369 lines (-70% reduction) by delegating to specialized modules
    • Implemented dependency injection pattern with clear module responsibilities:
      • LearningCategorizer: Auto-categorization with keyword scoring (124 lines)
      • LearningArchiver: Archive management for old learnings (116 lines)
      • LearningExtractor: Extract from sessions, git commits, code comments (343 lines)
      • LearningRepository: CRUD operations and data persistence (247 lines)
      • LearningReporter: Reports, statistics, search, timeline (349 lines)
      • LearningValidator: Validation logic and JSON schema (142 lines)
    • Added 13 compatibility wrapper methods to maintain backward compatibility with existing tests
    • Fixed FileOperationError exception handling in extractor for graceful JSON parsing failures
    • All 2,143 tests passing (100% pass rate) including 212 learning-related tests
    • Fixed all quality issues: ruff formatting (4 files), mypy type checking (2 errors)
    • Benefits: Single responsibility principle, improved testability, better code navigation, extensibility, loose coupling
  • Refactor: Complete Quality Gates modularization into specialized checker architecture

    • Decomposed monolithic 1,370-line gates.py god class into 10 focused, single-responsibility checker classes
    • Created modular checker architecture with abstract QualityChecker base class and CheckResult dataclass
    • Implemented 10 specialized checkers: SecurityChecker, ExecutionChecker, LintingChecker, FormattingChecker, DocumentationChecker, SpecCompletenessChecker, CustomValidationChecker, Context7Checker, IntegrationChecker, DeploymentChecker
    • Refactored main gates.py from 1,370 to 611 lines (-55%) by delegating to specialized checkers
    • Removed legacy gates_legacy.py (1,370 lines) after successfully migrating all functionality
    • Created reporter infrastructure: ConsoleReporter and JSONReporter for flexible output formatting
    • Added ResultAggregator for combining and analyzing checker results
    • Implemented dependency injection pattern with optional CommandRunner parameter for fast, isolated testing
    • Created comprehensive test suite: 220 new unit tests for all checker modules (360 tests total, up from 140)
    • Achieved 95%+ code coverage across all new modules (100% on 4 checkers, 94-99% on others)
    • Fixed all quality issues: ruff linting (91 errors), black formatting (28 files), mypy type checking (27 errors)
    • Renamed TestRunner to ExecutionChecker to avoid pytest collection warnings
    • Added configuration dataclasses: Context7Config, IntegrationConfig, DeploymentConfig
    • All 360 tests passing (100% pass rate) with 0.40s execution time
    • Maintained full backward compatibility with existing QualityGates interface
    • Benefits: Single responsibility principle, easy to test, pluggable architecture, clear separation of concerns, type-safe, highly maintainable
  • Refactor: Extract learning similarity engine into dedicated module

    • Created new src/solokit/learning/similarity.py module with reusable similarity detection algorithms
    • Implemented JaccardContainmentSimilarity class with configurable thresholds and stopword filtering
    • Implemented LearningSimilarityEngine with caching, pluggable algorithms, and Protocol-based design
    • Added comprehensive test suite (35 tests) covering similarity algorithms, caching, merging, and edge cases
    • Refactored LearningsCurator to delegate similarity operations to the new engine
    • Removed duplicate similarity logic from curator (simplified 4 methods, removed 1 internal method)
    • Fixed all ruff linting issues (14 deprecated typing imports converted to modern syntax)
    • Achieved 100% mypy type checking compliance with proper type annotations
    • All 1783 tests passing with no regressions
    • Benefits: Better separation of concerns, improved testability, reusable similarity algorithms
  • Refactor: Add comprehensive type hints across entire codebase

    • Added complete type hint coverage to all 55 source files in the codebase (100% coverage)
    • Fixed 348 mypy errors across 6 refactoring sessions, achieving 0 type checking errors
    • Modernized type annotations: converted Optional[X] to X | None syntax (14 occurrences)
    • Added from __future__ import annotations to 12 modules for forward reference support
    • Fixed Priority enum comparison methods to accept object parameter for protocol compatibility
    • Fixed ErrorContext.exit() return type to Literal[False] for strict context manager protocol
    • Added explicit return type annotations to 100+ functions including nested functions
    • Added type annotations for complex variables: dict[str, Any], list[dict[str, str]], etc.
    • Used # type: ignore[no-any-return] for unavoidable Any returns from json.load() and yaml.safe_load()
    • Applied ruff auto-formatting to 8 files for consistent code style
    • All 1520 unit tests passing with no regressions
    • Benefits: IDE autocomplete, early error detection, better refactoring safety, improved documentation

Added

  • Core Error Handling Infrastructure
    • Implemented comprehensive SDDError exception hierarchy with 50+ specialized exception types
    • Added ErrorCode enumeration with 40+ error codes for standardized error identification
    • Added ErrorCategory system (SYSTEM, USER, VALIDATION, NETWORK) for error classification
    • Implemented ErrorFormatter for consistent error display with exit code mapping
    • Added error handling decorators (@log_errors, @convert_subprocess_errors, @convert_file_errors)
    • Created structured logging integration with context preservation and exception chaining
    • All exceptions include context dict, remediation guidance, and proper exit codes

Changed

  • Standardized Error Handling Migration (Phases 1-3)
    • Migrated 33 production files from print() and return tuples to structured exception-based error handling
    • Phase 1 (11 files): Core utilities and briefing components
    • Phase 2 (8 files): Work item management and validation
    • Phase 3A (5 files): Core business logic (git/integration, quality/gates, learning/curator, session/complete, work_items/manager)
    • Phase 3B (3 files): Testing infrastructure
    • Phase 3C (6 files): Project management and configuration
    • Replaced 200+ print() error statements with proper exception raising
    • Replaced 26 return tuple patterns with exception-based error handling
    • Replaced 8 sys.exit() calls in business logic with exceptions (CLI entry points preserved)
    • Replaced 75+ broad Exception catches with specific exception types or catch-and-reraise pattern
    • Added @log_errors() decorators to 40+ key functions for structured logging
    • Updated 9 test files with pytest.raises() patterns and exception validation
    • Quality gates intentionally kept 47 return tuples for result aggregation (not errors)
    • All 1750 tests passing (100% coverage maintained)

Fixed

  • Linting and Formatting
    • Fixed 77 type annotation warnings (Optional[X] → X | None) using ruff --unsafe-fixes
    • Added missing ValidationError import in session/briefing.py
    • Formatted 31 files with ruff format for consistent code style
    • All ruff checks passing with zero errors

Investigated

  • Dataclass Migration Analysis
    • Investigated replacing dictionary-based data structures with Python dataclasses across the codebase
    • Analysis identified 1,260 dictionary patterns across 57 files requiring migration
    • Estimated effort: 30-35 hours with high risk of introducing bugs
    • Decision: Deferred indefinitely - current dict-based approach is stable and well-tested
    • Rationale: Low ROI for a working CLI tool, prefer TypedDict for gradual type improvements
    • All 1,471 tests passing (1,333 unit + 138 integration)

Changed

  • Refactor: Consolidate subprocess execution with CommandRunner

    • Replaced all direct subprocess.run() calls with centralized CommandRunner class
    • Updated 10 production files to use CommandRunner for consistent command execution:
      • visualization/dependency_graph.py - Graphviz SVG generation
      • session/validate.py - Git status validation
      • session/status.py - Git diff operations
      • session/complete.py - Stack/tree updates and git operations
      • learning/curator.py - Git log extraction
      • testing/performance.py - wrk load testing and docker operations
      • testing/integration_runner.py - Docker-compose and test execution
      • project/tree.py - Tree command execution
      • project/stack.py - Language version detection
      • project/init.py - Git init and dependency installation
    • Updated 9 test files with proper CommandRunner mocking patterns using CommandResult objects
    • Benefits: consistent error handling, timeout management, retry logic, and centralized logging
    • Fixed pytest collection warning by renaming TestExecutionConfig to ExecutionConfig
    • All 1,563 tests passing with zero warnings
  • Refactor: Decompose briefing.py god-class into modular package

    • Decomposed monolithic 1,166-line session/briefing.py into focused package structure with 9 modules
    • Created session/briefing/ package with single-responsibility modules averaging ~150 lines each:
      • orchestrator.py - SessionBriefing class for coordinating components
      • work_item_loader.py - WorkItemLoader for loading and resolving work items
      • learning_loader.py - LearningLoader for loading and scoring relevant learnings
      • documentation_loader.py - DocumentationLoader for project docs discovery
      • stack_detector.py - StackDetector for technology stack detection
      • tree_generator.py - TreeGenerator for directory tree loading
      • git_context.py - GitContext for git status and branch operations
      • milestone_builder.py - MilestoneBuilder for milestone context
      • formatter.py - BriefingFormatter for text formatting and generation
    • 100% backward compatibility maintained through wrapper functions in __init__.py
    • Added GitStatus.PR_CLOSED and GitStatus.DELETED enum values for complete git workflow states
    • Class-based API enables better testability, reusability, and dependency injection
    • All 1,440 unit and integration tests passing with no regressions
    • Created comprehensive migration guide in docs/development/BRIEFING_REFACTOR_MIGRATION_GUIDE.md
    • Benefits: improved maintainability, testability, code organization, and extensibility
  • Refactor: Replace magic strings with type-safe enums

    • Created comprehensive enum system in core/types.py with 4 enums: WorkItemType, WorkItemStatus, Priority, GitStatus
    • Updated 12 modules to use type-safe enums instead of magic strings
    • Priority enum supports comparison operations (<, >, <=, >=) for prioritization logic
    • GitStatus enum updated to match actual workflow states (in_progress, ready_to_merge, ready_for_pr, pr_created, merged)
    • All enums inherit from str for seamless JSON serialization compatibility
    • Each enum provides .values() class method for validation and iteration
    • 100% backward compatibility maintained - no changes to JSON data formats
    • All 1,532 tests passing with no regressions
    • Created comprehensive documentation in docs/development/ENUM_USAGE_GUIDE.md with usage patterns, examples, and migration guide
    • Benefits: IDE autocomplete, type safety, easier refactoring, single source of truth for valid values
  • Refactor: Centralized configuration management with ConfigManager

    • Created core/config.py with singleton ConfigManager for centralized config loading
    • Type-safe dataclasses for all config sections (QualityGatesConfig, CurationConfig, GitConfig)
    • Caching mechanism to avoid redundant file reads with invalidation support
    • Refactored 5 modules to use ConfigManager: quality/gates.py, git/integration.py, learning/curator.py, session/complete.py, session/validate.py
    • 21 comprehensive unit tests for ConfigManager with 98% coverage
    • Fixed 8 previously skipped tests in test suite
    • Removed 3 obsolete test classes (duplicate config loading tests)
    • All 1256 unit tests pass (up from 1248) with 0 skipped tests
    • Net reduction of 183 lines of code through deduplication
    • Eliminated duplicate config loading logic across modules
  • Refactor: Consolidated JSON file I/O operations

    • Centralized all JSON file operations in core/file_ops.py with JSONFileOperations class
    • Added FileOperationError exception for consistent error handling
    • Enhanced features: atomic writes by default, optional validation hooks, automatic directory creation
    • New load_json_safe() method for guaranteed return (never raises)
    • Removed duplicate _load_json and _save_json methods from learning/curator.py
    • 97% test coverage with 41 comprehensive unit tests
    • All 1240 unit tests pass with no regressions
    • Eliminated ~100+ lines of duplicate code across codebase
    • Created comprehensive API reference documentation in docs/reference/file-operations-api.md
    • Updated architecture documentation

[0.1.0] - 2025-10-26

Note: Versions 0.6.0 and 0.7.0 were development versions that have been consolidated into the 0.1.x public release series.

Added

  • Enhanced session briefings with context continuity

    • Previous Work section for in-progress items showing commits, file stats, and quality gates from prior sessions
    • Enriched session summaries with full commit messages and file change statistics
    • Enhanced learning relevance scoring using multi-factor algorithm (keywords, type, recency, category bonuses)
    • Top 10 relevant learnings (up from 5) with intelligent scoring
    • Fixes briefing update bug - briefings now regenerated for in-progress items
    • Fixes timing issue - work_items data reloaded after recording commits to ensure accurate summaries
    • Makes multi-session work practical by eliminating context loss
    • 22 new comprehensive unit tests for helper functions and enhanced functionality
    • Updated documentation in .claude/commands/start.md and .claude/commands/end.md
  • Work item deletion - Safe deletion of work items with dependency checking

    • New sk work-delete <work_item_id> command
    • Interactive mode with 3 options: keep spec, delete spec, or cancel
    • Non-interactive mode with --keep-spec and --delete-spec flags
    • Dependency checking warns about dependent work items
    • Automatic metadata updates (total_items, status counts)
    • 19 comprehensive unit tests
    • Full documentation in .claude/commands/work-delete.md and docs/commands/work-delete.md
  • Work item completion status control - Explicit control over work item completion during session end

    • Interactive 3-choice prompt: "Mark completed", "Keep in-progress", "Cancel"
    • Command-line flags: --complete and --incomplete
    • Supports multi-session workflows
    • 8 unit tests added
  • PyPI Publishing Workflow - Automated package publishing to PyPI on GitHub releases

  • Comprehensive test infrastructure - Test suite reorganization and expansion

    • 1,408 comprehensive tests (up from 183, 765% increase)
    • 85% code coverage (up from 30%)
    • Unit/integration/e2e structure across 35 test files
    • 4 modules at 100% coverage, 20 modules at 75%+ coverage
  • Auto git initialization - sk init now automatically initializes git repository and creates initial commit

  • Pre-flight commit check - sk end validates all changes are committed before running quality gates

  • CHANGELOG workflow improvements - Git hooks with reminders + smarter branch-level detection

  • OS-specific .gitignore patterns - macOS, Windows, and Linux patterns automatically added during sk init

Changed

  • BREAKING: Package structure migrated to standard Python src/ layout
    • Moved all Python modules from flat directory to organized src/solokit/ package structure
    • Created domain-organized subdirectories: core/, session/, work_items/, learning/, quality/, visualization/, git/, testing/, deployment/, project/
    • Updated all imports from scripts.X to solokit.X pattern (43 files)
    • Removed all sys.path.insert() hacks (38 instances)
    • Removed setup.py in favor of PEP 517/518 pyproject.toml-only configuration
    • CLI command remains solokit (no user-facing changes)
    • All tests pass, PyPI-ready structure, better IDE support
  • Simplified git branch naming - Branch names now use work item ID directly
    • Format: feature_oauth instead of session-001-feature_oauth
    • Clearer intent, shorter names, backward compatible
  • Standardized spec validation - All work item types now use "Acceptance Criteria" section consistently
    • Updated refactor specs to use "Acceptance Criteria" (was "Success Criteria")
  • Makefile clean target - Now removes coverage artifacts (htmlcov/, coverage.xml, coverage.json)

Fixed

  • Quality gates test timeout - Increased from 5 to 10 minutes (1408 tests take ~6 minutes)
  • Docstring validation - Fixed pydocstyle configuration to properly validate project docstrings
  • Bug #25: Git branch status now finalizes when switching work items (12 unit tests)
  • Bug #24: /start command now properly handles explicit work item selection (3 unit tests)
  • Bug #23: Bug/refactor spec templates now include "Acceptance Criteria" section
  • Bug #21: Learning curator no longer extracts test data strings (21 unit tests)
  • Bug #20: Multi-line LEARNING statements now captured completely (30 unit tests)
  • UX improvements: Auto git init, pre-flight checks, CHANGELOG reminders, clear error messages

Removed

  • Deleted obsolete development tracking files (NEXT_SESSION_PROMPT.md, TEST_PROGRESS.md)
  • Removed 38 instances of sys.path.insert() manipulation
  • Removed flat directory structure
  • Removed E402 ignore from ruff config

[0.5.8] - 2025-10-21

Added

  • Marketplace Plugin Support: Solokit now works as a Claude Code marketplace plugin
  • One-time setup command for plugin users: pip install -e ~/.claude/plugins/marketplaces/claude-plugins/solokit
  • Simplified installation documentation with clear paths for both marketplace and direct installation

Changed

  • Unified CLI: All 15 slash command files now use solokit command instead of relative paths
  • Updated command files: init.md, start.md, end.md, status.md, validate.md, learn*.md, work-*.md
  • Simplified README installation section with two clear options (marketplace vs. direct)
  • Updated all CLI examples throughout documentation to use solokit command
  • Updated marketplace README (claude-plugins/README.md) with v0.5.8 installation instructions
  • Updated Architecture Notes to reflect v0.5.8 changes

Technical Details

  • Files Modified: 18 files total
    • 15 command files (.claude/commands/*.md)
    • 1 main README (README.md)
    • 1 marketplace README (in separate repo)
    • 1 pyproject.toml (version bump)
  • Breaking Changes: Command files no longer use relative Python paths - now use solokit CLI
  • Migration: Users must run pip install -e . if not already done

Migration Guide

For marketplace plugin users:

pip install -e ~/.claude/plugins/marketplaces/claude-plugins/solokit

For existing direct installations:

cd /path/to/solokit
pip install -e .

All slash commands will now work via the solokit CLI.

Benefits

  • ✅ Plugin works from marketplace installation
  • ✅ No need to clone Solokit into every project
  • ✅ Cleaner, more standard approach
  • ✅ Works identically whether installed directly or via marketplace
  • ✅ Aligns with Python package best practices

Reference

See ROADMAP.md Phase 5.8 for complete details.


[0.5.7] - 2025-10-18

Added

  • Spec-first architecture: .session/specs/*.md files are now the single source of truth for work item content
  • Comprehensive markdown parser (spec_parser.py, 700+ lines) supporting all 6 work item types
  • Spec file validation system with required section checks and quality gates
  • Complete context loading - removed all compression (50-line tree limit, 500-char doc limits)
  • Writing guide (docs/guides/writing-specs.md, 500+ lines) with examples for all work item types
  • Template structure documentation (docs/reference/spec-template-structure.md)

Changed

  • Eliminated dual storage problem - work item content now only in spec files, not work_items.json
  • Enhanced all 6 spec templates with comprehensive examples and inline guidance
  • Updated briefing system to load full spec content without truncation
  • Refactored validators and runners to use spec parser
  • Quality gates now validate spec completeness before session completion

Removed

  • Content fields from work_items.json (rationale, acceptance_criteria, implementation_paths, test_paths)
  • Compression limits on project documentation
  • Duplicate briefing sections

Technical Details

  • Tests Added: 49 tests across 6 test files
  • Code Added: ~3,200 lines (spec_parser.py, spec_validator.py, templates, docs)
  • Files Created: 8 new files (validator, docs, test files)
  • Files Enhanced: 12 files (briefing_generator, quality_gates, templates, commands)

Reference

See ROADMAP.md Phase 5.7 for complete details.


[0.5.6] - 2025-10-15

Added

  • Deployment work item type with comprehensive validation framework
  • Deployment execution framework with pre-deployment validation and rollback automation
  • Environment validation system with 7 validation types (connectivity, configuration, dependencies, health checks, monitoring, infrastructure, capacity)
  • Deployment quality gates integrated with quality_gates.py
  • Multi-environment support (staging vs production with different configurations)
  • Automated smoke test execution with timeout and retry support
  • Dry-run mode for deployment simulation

Changed

  • Enhanced deployment_spec.md template with 11 sections including deployment procedure, rollback, smoke tests
  • Session workflow now includes deployment-specific briefings and summaries
  • Quality gates include deployment validation before execution

Technical Details

  • Tests Added: 65 tests across 5 test files
  • Code Added: ~2,049 lines (deployment_executor.py, environment_validator.py, enhanced templates)
  • Validation Types: 7 comprehensive environment checks
  • Focus: Production deployment safety and automation

Reference

See ROADMAP.md Phase 5.6 for complete details.


[0.5.5] - 2025-10-15

Added

  • Integration testing framework with comprehensive validation
  • Enhanced integration test work item type with multi-component dependency tracking
  • Integration test execution framework with Docker Compose orchestration
  • Performance benchmarking system with regression detection (10% threshold)
  • API contract validation with breaking change detection
  • Integration quality gates with environment validation
  • Integration documentation requirements (architecture diagrams, sequence diagrams, API contracts)

Changed

  • Enhanced integration_test_spec.md template with test scenarios, performance benchmarks
  • Session workflow includes integration-specific briefings and summaries
  • Quality gates validate integration test environment before execution

Technical Details

  • Tests Added: 178 tests across 7 test files
  • Code Added: ~5,458 lines (integration_test_runner.py, performance_benchmark.py, api_contract_validator.py)
  • Performance Tracking: Latency percentiles (p50, p75, p90, p95, p99), throughput, response time
  • Focus: Multi-service integration validation and performance regression detection

Reference

See ROADMAP.md Phase 5.5 for complete details.


[0.5] - 2025-10-14

Added

  • Quality gates system for automated quality enforcement at session completion
  • Test execution with coverage parsing and multi-language support (Python, JavaScript, TypeScript)
  • Security scanning integration (bandit, safety, npm audit) with severity-based filtering
  • Linting and formatting with auto-fix modes (ruff, eslint, prettier)
  • Documentation validation (CHANGELOG, docstrings, README)
  • Context7 MCP integration (stub ready for production)
  • Custom validation rules (per-work-item and project-level)
  • Quality gate reporting with remediation guidance

Changed

  • Session completion now enforces quality standards before allowing completion
  • Extracted quality gate logic into dedicated quality_gates.py module (770 lines)
  • Added quality gates configuration to .session/config.json during /init

Fixed

  • pytest exit code 5 ("no tests collected") now treated as skipped, not failed
  • Auto-fix modes for linting and formatting improve developer experience

Technical Details

  • Tests Added: 54 tests across all quality gate types
  • Code Added: 875 lines (quality_gates.py, config integration)
  • Tools Supported: pytest, ruff, bandit, safety, eslint, prettier, npm audit
  • Configuration: Required vs optional gate enforcement

Reference

See ROADMAP.md Phase 5 for complete details.


[0.4] - 2025-10-14

Added

  • Learning capture and curation system for knowledge management
  • 4 learning commands: /learn, /learn-show, /learn-search, /learn-curate
  • Auto-categorization into 6 categories (architecture_patterns, gotchas, best_practices, technical_debt, performance_insights, security)
  • Similarity detection using Jaccard (0.6) and containment (0.8) thresholds
  • Automatic duplicate detection and merging
  • Multi-source learning extraction (session summaries, git commits with LEARNING:, inline # LEARNING: comments)
  • Enhanced browsing with filters (category, tags, date range, session number)
  • Statistics dashboard and timeline view
  • Auto-curation trigger every N sessions (default 5, configurable)

Changed

  • Sessions now include automated learning capture at completion
  • .session/config.json includes learning configuration (auto_curate_frequency, similarity_threshold)

Technical Details

  • Tests Added: 53 tests across all learning features
  • Code Added: ~1,587 lines (commands, documentation, integration)
  • Documentation: docs/reference/learning-system.md guide (550 lines)
  • Categories: 6 comprehensive categories covering software development learnings

Reference

See ROADMAP.md Phase 4 for complete details.


[0.3] - 2025-10-13

Added

  • Work item dependency graph visualization with critical path analysis
  • /work-graph command with multiple output formats (ASCII, DOT, SVG)
  • Graph filtering options (status, milestone, type, focus node, include-completed)
  • Critical path analysis with automatic highlighting in all formats
  • Bottleneck detection (items blocking 2+ others)
  • Graph statistics (total items, completion percentage, critical path length)
  • Neighborhood view with --focus for exploring specific work items

Changed

  • Enhanced dependency_graph.py with 313 new lines for CLI integration
  • Graph visualization updates automatically when work items change

Technical Details

  • Tests Added: 36 tests across 6 sections
  • Code Added: 426 lines (command integration, enhanced graph features)
  • Formats: ASCII (terminal-friendly), DOT (Graphviz), SVG (documentation)
  • Focus: Understanding project structure and identifying bottlenecks

Reference

See ROADMAP.md Phase 3 for complete details.


[0.2] - 2025-10-13

Added

  • Work item management system with full CRUD operations
  • 6 work item types (feature, bug, refactor, security, integration_test, deployment)
  • 5 work item commands: /work-new, /work-list, /work-show, /work-update, /work-next
  • Dependency tracking and resolution
  • Priority levels (critical, high, medium, low) with visual indicators (🔴🟠🟡🟢)
  • Milestone organization and progress tracking
  • Status tracking (backlog, in_progress, completed, blocked)
  • Conversational interface for work item creation (Claude Code compatible)

Changed

  • Sessions now include comprehensive work item tracking
  • Briefings include milestone context and dependency status
  • /status command shows work item context and progress

Technical Details

  • Tests Added: 9 tests for work item management
  • Code Added: work_item_manager.py (500+ lines)
  • CLI Commands: Non-interactive mode for Claude Code compatibility
  • Storage: JSON-based work item tracking in .session/work_items.json

Reference

See ROADMAP.md Phase 2 for complete details.


[0.1] - 2025-10-13

Added

  • Core session management framework with complete workflow
  • /init command for project initialization
  • Stack tracking system (generate_stack.py) with technology detection
  • Tree tracking system (generate_tree.py) with structure change detection
  • Git workflow integration (git_integration.py) with branch management
  • Enhanced /start with comprehensive context loading (docs, stack, tree, git)
  • Enhanced /end with tracking updates and quality gates
  • /validate command for pre-flight checks before session completion
  • Multi-session work item support (resume on same branch)

Changed

  • Session initialization creates .session/ directory structure
  • Briefings include full project context (vision, architecture, stack, tree)
  • Session completion updates all tracking files automatically

Technical Details

  • Tests Added: 6 core tests
  • Code Added: 2,174 lines across 12 scripts
  • Infrastructure: .session/ directory with tracking files
  • Git Integration: Automatic branch creation, commit, push, merge

Reference

See ROADMAP.md Phase 1 for complete details.


[0.0] - 2025-10-10

Added

  • Foundation and documentation for Session-Driven Development methodology
  • Repository structure with .claude/commands/ directory (16 slash commands)
  • Basic briefing generation (briefing_generator.py)
  • Basic session completion (session_complete.py)
  • Learning curation system (learning_curator.py) - complete and production-ready
  • Dependency graph visualization (dependency_graph.py) - complete and production-ready
  • File operation utilities (file_ops.py)
  • Comprehensive methodology documentation (docs/solokit-methodology.md)
  • Implementation insights documentation (docs/implementation-insights.md)
  • AI-augmented framework reference (docs/ai-augmented-solo-framework.md)

Technical Details

  • Work Item Schema: Defined in templates/work_items.json
  • Learning Schema: Defined in templates/learnings.json
  • Algorithms: Dependency resolution (DFS-based), Learning categorization (keyword-based), Similarity detection (Jaccard + containment)

Reference

See ROADMAP.md Phase 0 for complete details.


Version Numbering

Versions follow semantic versioning (MAJOR.MINOR.PATCH):

  • MAJOR: Incompatible API changes
  • MINOR: New functionality (backward compatible)
  • PATCH: Bug fixes (backward compatible)

Phase mapping to public release versions:

  • Phases 0-5.9 (Development phases) → v0.1.0 (Initial Public Release)
    • Phase 0: Foundation & documentation
    • Phase 1: Core session workflow
    • Phase 2: Work item system
    • Phase 3: Dependency graphs
    • Phase 4: Learning management
    • Phase 5: Quality gates
    • Phase 5.5: Integration testing
    • Phase 5.6: Deployment support
    • Phase 5.7: Spec-first architecture
    • Phase 5.8: Marketplace plugin support
    • Phase 5.9: Standard Python src/ layout & PyPI publishing
  • v0.3.0 = Current release ✅ Current (Minimal init mode, bug fixes)
  • v0.2.2 = Previous release (Fix missing test_execution.commands config)
  • v0.2.1 = Earlier release (Critical CVE patches for Next.js/React templates)
  • v0.2.0 = Earlier release (Tailwind CSS v4 migration, CHANGELOG check fixes)
  • v0.1.7 = Earlier release (Improved /end command flow, slash command format)
  • v0.1.6 = Earlier release (Minimal scaffolding migration complete)
  • v0.1.5 = Earlier release (GitHub setup integration, security fixes)
  • v0.1.4 = Earlier release (Test coverage improvements)
  • v0.1.3 = Earlier release (Documentation model improvements)
  • v0.1.1 = Earlier release (UX improvements & bug fixes)
  • v1.0.0 = Stable API release (planned)