Changelog

January 15, 2026 · View on GitHub

All notable changes to the Solokit (Session-Driven Development) project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

Meaningful Test Guidance in Documentation
- Added explicit guidance that tests should represent real use cases, not just satisfy coverage metrics
- Updated PRD_WRITING_GUIDE.md with "Write Meaningful Tests, Not Just Coverage" section
- Updated writing-specs.md with new tip on scenario-driven vs coverage-driven tests
- Key principle: "If this feature broke in production, would this test catch it?"
- Helps prevent false confidence from tests that pass but don't catch real bugs

[0.3.0] - 2026-01-14

Added

Minimal Init Mode (sk init --minimal) (PR #207)
- New --minimal flag for lightweight project initialization
- Installs only session tracking infrastructure without templates or quality tiers
- Ideal for simple projects (HTML sites, scripts, prototypes) that don't need testing/linting
- Creates: .session/ structure, guides, Claude Code slash commands, CLAUDE.md, README.md, CHANGELOG.md
- Quality gates disabled by default in minimal mode config
- Git repository and GitHub setup still available
- Comprehensive test coverage: 28 unit tests, 11 integration tests

Fixed

/end Command Fails on New Projects with Few Commits
- Changed git log --oneline HEAD~10..HEAD to git log --oneline -10 in end.md command
- The previous syntax fails when a repository has fewer than 10 commits
- Affects all solokit-initialized projects when running /end early in development
Semgrep CI Installation Failure in ml_ai_fastapi Template
- Replaced unpinned pip install semgrep with official returntocorp/semgrep-action@v1 GitHub Action
- Fixes CI security workflow failures caused by semgrep dependency resolution cascade
- The official action handles installation complexity and provides native GitHub integration
- Affects projects using ml_ai_fastapi template with CI/CD option enabled

[0.2.2] - 2025-12-05

Fixed

Missing test_execution.commands in Generated Config (PR #193)
- Added commands configuration to test_execution in session_structure.py
- Previously, sk validate would show ✗ tests: None because no test command was configured
- Now includes default commands for Python (pytest), JavaScript, and TypeScript (npm test)
- Affects all projects initialized with sk init or sk adopt
Validation Message Showing "None" Instead of Error
- Fixed display bug where validation used dict.get() incorrectly
- Now properly shows "Tests fail" message instead of "None"

Changed

Updated config.schema.json with commands property for test_execution
Updated test fixtures and documentation with complete configuration examples

[0.2.1] - 2025-12-04

Security

Critical CVE Patches for Next.js and React Templates (PR #191)
- Updated Next.js from 16.0.1 to 16.0.7 to address CVE-2025-66478 (CVSS 10.0)
- Updated React from 19.2.0 to 19.2.1 to address CVE-2025-55182
- Updated react-dom from 19.2.0 to 19.2.1
- Updated eslint-config-next from 16.0.1 to 16.0.7
- Updated @next/bundle-analyzer from 16.0.1 to 16.0.7
- Affects all Next.js templates: fullstack_nextjs, saas_t3, dashboard_refine (all tiers)
- Reference: https://nextjs.org/blog/CVE-2025-66478

[0.2.0] - 2025-12-03

Fixed

Tailwind CSS v4 Migration for All Templates (fullstack_nextjs, saas_t3, dashboard_refine)
- Migrated from Tailwind v3 @tailwind directives to v4 @import "tailwindcss" syntax
- Moved theme configuration from JavaScript tailwind.config.ts to CSS @theme blocks
- Updated components to use theme tokens (bg-background, text-foreground) instead of hardcoded colors
- Fixed accessibility violation: Changed text-blue-500 to text-blue-700 for WCAG AA contrast compliance
- Updated all test files to match new theme-based styling expectations
- Added comprehensive Tailwind v4 documentation to ARCHITECTURE.md files
CHANGELOG Update Check False Positives in /end Command
- Changed check from HEAD~10 (last 10 commits) to main...HEAD (commits since branch creation)
- Previous logic gave false positives when a prior session had updated CHANGELOG
- Now correctly detects whether CHANGELOG was updated in the current session/branch

[0.1.7] - 2025-11-28

Changed

Improved /end Command Flow (PR #184)
- Streamlined /end command with clear 5-step process: pre-flight checks, completion status, run sk end, create PR, show results
- sk end no longer attempts commits - Claude handles git commits, sk end just verifies and pushes
- Better error messages: Clear guidance when no commits found on branch
- Removed confusing two-option learning approach (commit tags vs temp file)
Slash Command Format for User-Facing Suggestions (PR #186)
- Updated all user-facing command suggestions to use slash command format (/start, /end, /validate) instead of CLI format (sk start, sk end)
- Updated 30 files including CLAUDE.md templates, Python CLI output, and guides
- Principle: User-facing suggestions use / slash format, execution examples use sk CLI format

Fixed

CHANGELOG.md Not Being Copied During sk init (PR #184)
- CHANGELOG.md template now properly copied to project root during initialization
- Won't overwrite existing CHANGELOG.md files
Git Commit Double-Wrapped Error Messages (PR #184)
- Fixed "Commit failed: Commit failed:" double-wrapped error messages
CI Template Permissions and Smoke Test Endpoint (PR #185)
- Added pull-requests: read permission to secrets-scan job in security.yml (required for gitleaks-action)
- Updated smoke-test in test.yml to test / instead of /api/health for Node.js templates
- Smoke tests no longer require database connectivity

Added

PRD and STACK Guide References in Templates (PR #183)
- Added "Writing PRDs" subsection under Claude Behavior Guidelines in all CLAUDE.md templates
- Updated "Reference Documentation" section with mandatory PRD guide reference
- Updated "Key Files" table to include PRD_WRITING_GUIDE.md and STACK_GUIDE.md

[0.1.6] - 2025-11-27

Changed

Minimal Scaffolding Migration: Phase 4 Documentation & Cleanup
- Updated main README.md:
  - Changed "production-ready templates" messaging to "minimal scaffolding templates"
  - Added PRD-driven development workflow
  - Updated template descriptions to emphasize documentation over example code
  - Updated test counts to 3,802 tests
  - Added Development Guides section for .session/guides/
- Updated template-registry.json:
  - All template descriptions now say "Minimal scaffolding for..."
  - Category descriptions updated to reflect minimal scaffolding approach
- Documentation consistency fixes across all 4 stacks:
  - Added "These are files you will CREATE" clarifications to ARCHITECTURE.md examples
  - Added "target structure you will build" notes to STACK_GUIDE.md file organization sections
  - Fixed path inconsistencies in CLAUDE.md.template files
  - Updated test counts in docs/project/ROADMAP.md and tests/e2e/README.md
- All 3,802 tests pass
Minimal Scaffolding Migration: dashboard_refine (Phase 3.4)
- Removed example code (~977 lines) from dashboard_refine template:
  - Removed app/(dashboard)/users/ directory (95 + 122 lines) - Example users list page and tests
  - Removed components/forms/user-form.tsx and tests (76 + 206 lines) - Example form component
  - Removed lib/validations.ts and tests (52 + 225 lines) - Example validation schemas
  - Removed tier-1-essential/tests/unit/example.test.tsx (77 lines) - Example tests
  - Removed tier-3-comprehensive/tests/e2e/user-management.spec.ts (124 lines) - User E2E tests
  - Removed tier-3-comprehensive/tests/integration/dashboard.test.ts (28 lines) - Example integration test
- Updated lib/refine.tsx:
  - Replaced mock data provider with placeholder that throws helpful errors
  - Cleared example resources array
  - Added comprehensive documentation for data provider setup
- Updated components/layout/sidebar.tsx: Simplified to Dashboard route only
- Updated app/(dashboard)/page.tsx: Minimal welcome page with guidance cards
- Updated app/(dashboard)/__tests__/page.test.tsx: Tests for minimal page
- Updated lib/__tests__/refine.test.tsx: Tests for placeholder data provider
- Updated components/layout/__tests__/sidebar.test.tsx: Tests for minimal sidebar
- Updated tier-3-comprehensive/tests/e2e/dashboard.spec.ts: Tests for minimal dashboard
- Updated providers/__tests__/refine-provider.test.tsx: Fixed mock to use empty resources array
- Created tier-3-comprehensive/tests/integration/api.test.ts: Placeholder integration test
- Updated ARCHITECTURE.md:
  - Added "Building From Scratch" section with complete Refine patterns
  - Updated "Decision 2" to explain data provider requirement
  - Added comprehensive "Data Provider Options" section with REST, GraphQL, Supabase, Custom examples
  - Updated project structure to reflect minimal scaffolding
- Updated CLAUDE.md.template:
  - Added "Building From Scratch" section with quick pattern reference
  - Updated file organization table with correct paths
  - Updated data provider warning to reflect placeholder behavior
- Cleaned up empty directories (tier-1-essential/tests/unit/, tier-3-comprehensive/tests/integration/)
Documentation Consistency Fixes (All 4 stacks)
- saas_t3 ARCHITECTURE.md: Moved "Building From Scratch" section from end to after Overview
- ml_ai_fastapi CLAUDE.md.template: Reordered sections (Building From Scratch before Stack Architecture Rules)
- dashboard_refine CLAUDE.md.template: Reordered sections (Building From Scratch before Stack Architecture Rules)
- dashboard_refine docker/README.md: Fixed "mock data provider" → "placeholder data provider" terminology
- fullstack_nextjs CLAUDE.md.template: Removed incorrect src/ prefix from file paths
- fullstack_nextjs ARCHITECTURE.md: Fixed lib/validations/[feature].ts → lib/validations.ts
- saas_t3 CLAUDE.md.template: Removed incorrect src/ prefix from file paths
Minimal Scaffolding Migration: saas_t3 (Phase 3.3)
- Removed example code (~400 lines) from saas_t3 template:
  - Removed server/api/routers/example.ts (33 lines) - tRPC CRUD router
  - Removed server/api/routers/__tests__/example.test.ts (220 lines) - Router tests
  - Removed components/example-component.tsx (22 lines) - tRPC usage example
  - Removed components/__tests__/example-component.test.tsx (117 lines) - Component tests
  - Removed tier-1-essential/tests/unit/example.test.tsx (24 lines) - Example tests
- Updated server/api/root.ts: Removed example router, kept commented example
- Updated app/page.tsx: Minimal welcome page (no tRPC usage)
- Updated app/__tests__/page.test.tsx: Tests for minimal page
- Updated server/api/__tests__/root.test.ts: Removed example router references
- Updated prisma/schema.prisma: Removed User model, kept commented example
- Updated tier-3-comprehensive/tests/e2e/home.spec.ts: Tests for minimal page
- Updated ARCHITECTURE.md:
  - Added "Building From Scratch" section with complete tRPC patterns
  - Added "Type Safety Flow" diagram showing Prisma → tRPC → React chain
  - Updated project structure to reflect minimal scaffolding
- Updated CLAUDE.md.template:
  - Added "Building From Scratch" section
  - Updated code patterns to use generic posts example instead of example
- Added .gitkeep to preserve empty components/ and routers/ directories
Minimal Scaffolding Migration: ml_ai_fastapi (Phase 3.2)
- Removed example code (~430 lines) from ml_ai_fastapi template:
  - Removed src/api/routes/example.py (135 lines) - Full CRUD router
  - Removed src/models/example.py (61 lines) - SQLModel Item model
  - Removed src/services/example.py (115 lines) - Service layer example
  - Removed tests/unit/test_example.py (117 lines) - Example service tests
- Updated main.py.template: Removed example router import and include
- Updated models/__init__.py: Removed Item exports, added documentation example
- Updated core/database.py: Removed Item import, added documentation example
- Updated alembic/env.py: Removed Item import, added documentation example
- Updated tests/unit/test_api_routes.py: Removed TestItemRoutes, kept TestHealthRoutes
- Updated tests/integration/test_api.py: Removed TestItemAPIIntegration, kept TestHealthEndpoints
- Updated locustfile.py: Removed Item API tests, kept health check tests only
- Updated tier-4-production/src/core/monitoring.py: Removed example-specific counters
- Updated ARCHITECTURE.md:
  - Added "Building From Scratch" section with step-by-step guide
  - Updated project structure to reflect minimal scaffolding
- Updated CLAUDE.md.template:
  - Added "Building From Scratch" section
  - Added "Quick Pattern Reference" for adding new features
Minimal Scaffolding Migration: fullstack_nextjs (Phase 3.1)
- Removed example code (~1,206 lines) from fullstack_nextjs template:
  - Removed app/api/example/ route and tests
  - Removed components/example-component.tsx and tests
  - Removed lib/validations.ts (example Zod schemas) and tests
  - Removed User model from Prisma schema (kept commented example)
- Updated page.tsx to minimal welcome page (no example imports)
- Updated ARCHITECTURE.md:
  - Added "Building From Scratch" section with step-by-step guide
  - Removed references to example files in project structure
- Updated CLAUDE.md.template:
  - Added "Building From Scratch" section
  - Updated guidance to reference ARCHITECTURE.md instead of existing code
- Added health check test at app/api/health/__tests__/route.test.ts
- Updated E2E tests (flow.spec.ts) for minimal page
- Added .gitkeep to preserve empty components/ directory

Added

Quality Gate Adjustments for Minimal Scaffolding (Phase 2 of Minimal Scaffolding Migration)
- New scaffolding.py module with minimal scaffolding detection functions:
  - is_minimal_scaffolding(): Detects if project has only health check code
  - has_integration_test_files(): Checks for integration test files
  - has_e2e_test_files(): Checks for E2E test files (Playwright/Cypress)
- Integration tests now skip gracefully when no integration test files exist
- Integration tests handle missing spec files by skipping instead of failing
- Added 23 new tests for scaffolding module
- Added test for new integration checker behavior
- CI/CD workflows already use --if-present for conditional E2E/integration tests
- Coverage threshold only enforced when coverage report exists
- Tier configs use glob patterns that gracefully handle empty directories
PRD Writing Guide and Stack Selection Guide
- New STACK_GUIDE.md: Comprehensive guide for choosing between the 4 Solokit stacks
  - Quick decision tree for stack selection
  - Detailed comparison matrix (type safety, API style, learning curve, etc.)
  - Per-stack profiles with best-for/not-ideal-for guidance
  - When-to-switch guidance
- New PRD_WRITING_GUIDE.md: Complete PRD writing guide for Claude-driven development
  - Rewritten for AI-assisted workflow (Claude writes PRD and implements code)
  - Vertical slices philosophy and INVEST principles
  - Technical constraints section with stack selection
  - Definition of Ready (DoR) checklist for Claude
  - Mapping PRD stories to Solokit work items
  - Claude-optimized PRD template
- Guides automatically copied to .session/guides/ during sk init
- Updated post-init message to reference guides and recommend PRD workflow
- New fixture tracking_template_files_with_guides for testing
- Added 6 new tests for guide functionality (3,773 → 3,779 total tests)
Format/Lint Auto-Fix Before Initial Commit
- New Step 19 in sk init workflow: runs format and lint auto-fix before initial commit
- Fixes user-provided files (PRD.md, ROADMAP.md, etc.) that may have formatting issues
- Node.js projects: runs npm run format (Prettier) and npm run lint:fix (ESLint)
- Python projects: runs ruff format . and ruff check --fix .
- Silent operation: only logs at debug level, non-blocking on failure
- Init flow updated from 20 to 21 steps (GitHub setup moved to Step 21)
- New module: src/solokit/init/format_lint_fixer.py
- Added 18 new tests for format_lint_fixer module (3,755 → 3,773 total tests)

[0.1.5] - 2025-11-26

Added

GitHub Repository Setup Integration
- New src/solokit/github/ module for post-init GitHub repository setup
- Interactive prompts to create new repo or connect to existing one
- Supports both gh CLI and manual remote configuration
- Added check_git_installed() and check_gh_installed() to environment validator
- Integrated as Step 20 in sk init workflow (after initial commit)
- Added 45 new tests for GitHub setup module (3,710 → 3,755 total tests)
Safe Config Implementation for sk adopt
- Added intelligent file categorization: NEVER_OVERWRITE, MERGE_IF_EXISTS, INSTALL_IF_MISSING
- New backup system: all modified files backed up to .solokit-backup/<timestamp>/
- Smart merge strategies for 7 file types: package.json, pyproject.toml, eslint.config.mjs, .prettierrc, .pre-commit-config.yaml, requirements.txt, .husky/pre-commit
- Added --dry-run flag to preview changes without modifications
- Improved warning message showing categorized file handling
- Added 119 new tests for backup and merge modules (3,591 → 3,710 total tests)
- Test breakdown: 3,452 unit + 178 integration + 80 e2e tests
- New files: adopt/backup.py, adopt/merge_strategies.py
- All quality checks passing: ruff, mypy, formatting, 97% coverage

Security

Fixed Sentry Security Vulnerability (GHSA-6465-jgvq-jhgp)
- Upgraded @sentry/nextjs from 10.23.0 to 10.27.0 in all Next.js templates
- Vulnerability: Sensitive headers leaked when sendDefaultPii is set to true
- Affected versions: 10.11.0 - 10.26.0
- Updated: stack-versions.yaml and all tier-4 package.json.tier4.template files
Fixed npm audit vulnerabilities
- Added tmp: 0.2.5 override to fix @lhci/cli vulnerability
- All Next.js templates now pass npm audit with 0 vulnerabilities

Fixed

TypeScript Type Conflicts in E2E Tests
- Fixed AxeBuilder type conflict between @playwright/test and @axe-core/playwright
- Applied as any cast workaround in all tier-3 e2e test files
- Affected: saas_t3, dashboard_refine, fullstack_nextjs templates
Prettier Formatting Issues in Templates
- Fixed ARCHITECTURE.md and CLAUDE.md.template formatting across all stacks
- Fixed blank line handling in readme_generator.py and claude_md_generator.py
- All templates now pass prettier --check
E2E Test Python Executable
- Fixed test_core_session_workflow.py to use sys.executable instead of hardcoded python3

[0.1.4] - 2025-11-24

Added

Comprehensive Test Coverage Improvements
- Increased overall test coverage from 93% to 96% (+3%)
- Added 242 new tests (2,983 → 3,225 total tests)
- Test breakdown: 2,980 unit + 165 integration + 80 e2e tests
- Files improved to >95% coverage:
  - protocols.py: 65% → 100%
  - dependency_installer.py: 78% → 100%
  - readme_generator.py: 80% → 100%
  - git_context.py: 83% → 98%
  - environment_validator.py: 85% → 100%
  - initial_commit.py: 85% → 100%
  - env_generator.py: 86% → 100%
  - git_hooks_installer.py: 89% → 96%
  - template_installer.py: 89% → 100%
  - gitignore_updater.py: 90% → 99%
  - extractor.py: 85% → 96%
  - curator.py: 86% → 97%
  - archiver.py: 90% → 100%
  - documentation_loader.py: 90% → 100%
  - spec_validator.py: 86% → 100%
  - updater.py: 86% → 98%
  - repository.py: 91% → 100%
  - tree.py: 85% → 98%
  - integration_runner.py: 86% → 100%
  - performance.py: 87% → 99%
  - formatter.py: 88% → 97%
  - query.py: 88% → 95%
  - quality/checkers/base.py: 89% → 100%
  - quality/reporters/base.py: 86% → 100%
- Created 4 new test files:
  - tests/unit/quality/checkers/test_base.py
  - tests/unit/session/briefing/test_documentation_loader.py
  - tests/unit/session/briefing/test_formatter.py
  - tests/unit/session/briefing/test_git_context.py
- All quality checks passing: ruff, mypy, formatting
Three-File Documentation Model for Project Initialization
- Implemented comprehensive documentation structure with distinct purposes:
  - README.md: Quick start guide (generated, project-specific)
  - ARCHITECTURE.md: Technical documentation (static template, comprehensive)
  - CLAUDE.md: AI guidance for Claude Code (generated from template)
- Created ARCHITECTURE.md files for all 4 stacks with comprehensive technical documentation:
  - Architecture decisions with rationale and trade-offs
  - Code patterns and examples
  - Project structure explanations
  - Database workflows
  - Troubleshooting guides
- Created CLAUDE.md.template files for all 4 stacks with:
  - Stack-specific architecture rules and patterns
  - Comprehensive Solokit command usage guide
  - Claude behavior guidelines
  - Work item management instructions
  - Session workflow documentation
  - Learning capture best practices
  - Stack-specific anti-patterns and common mistakes
- Added claude_md_generator.py module for CLAUDE.md generation
- Integrated CLAUDE.md generation into sk init workflow (Step 6)
- Updated orchestrator step numbering to be sequential (1-19)
- Affects: All stacks (saas_t3, ml_ai_fastapi, dashboard_refine, fullstack_nextjs)

Changed

README Generator Improvements
- Implemented cumulative quality gates (each tier includes all previous tiers' requirements)
- Added stack-aware quality gates from template registry:
  - JavaScript stacks: E2E tests (Playwright), Bundle analysis, Lighthouse CI
  - Python stacks: Load testing (Locust), API documentation (OpenAPI), Performance profiling
- Fixed uvicorn command for Python stack: uvicorn src.main:app --reload (was main:app)
- Added environment setup section with .env.local instructions
- Added database setup section (Prisma for npm stacks, Alembic for Python stacks)
- Improved additional options display using registry names and descriptions
- Added ARCHITECTURE.md reference section to all generated READMEs
- Affects: All stacks
Template Registry Enhancements
- Made quality gates stack-aware with adds_js and adds_python fields
- Tier 3 quality gates now separate JavaScript-specific (Playwright) from Python-specific (Locust)
- Tier 4 quality gates now separate JavaScript-specific (Bundle analysis, Lighthouse) from Python-specific (OpenAPI, Performance profiling)
- Removed stack_specific field (replaced with cleaner adds_js/adds_python structure)
- Affects: template-registry.json

Fixed

CI Workflow Improvements for GitHub Actions
- Fixed Lighthouse CI Chrome sandbox issues on GitHub Actions runners
  - Added puppeteerLaunchArgs with --no-sandbox, --disable-dev-shm-usage, --disable-gpu flags
  - Simplified lighthouse npm script to just lhci autorun
  - Added Playwright browser installation step to lighthouse workflow
- Fixed Gitleaks failing on initial commit (no parent to compare against)
  - Added condition to skip secrets-scan on initial push events
  - PRs always run secrets-scan correctly
- Fixed dependency-review-action failing when Dependency Graph not enabled
  - Added conditional check via GitHub API before running dependency review
  - Shows warning if Dependency Graph is not enabled instead of failing
- Removed custom CodeQL workflow to avoid conflicts with GitHub's default setup
  - GitHub's default CodeQL setup is recommended (enable in repo Settings > Security)
- Added /api/health endpoint to all Next.js stacks for smoke tests
- Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine)
Python Stack (ml_ai_fastapi) CI Workflow Fixes
- Upgraded FastAPI from 0.115.6 to 0.121.3 to resolve starlette version conflict
- Added setuptools>=78.1.1 to security dependencies to fix pip-audit vulnerability
- Fixed vulture false positives by renaming unused parameters to _logger and _method_name
- Fixed Pyright missing imports by installing all deps (dev + prod) in quality-check workflow
- Fixed detect-secrets to handle missing .secrets.baseline gracefully
- Fixed pip-audit to run without --require-hashes (requirements.txt typically doesn't have hashes)
- Added conditional Dependency Graph check for dependency-review-action
- Fixed cosmic-ray installation for mutation testing workflow
- Removed .secrets.baseline from .gitignore (should be tracked in git)
- Affects: ml_ai_fastapi stack
Playwright Browser Installation and System Dependencies
- Fixed Playwright browsers not launching on Linux due to missing system dependencies
- Added automatic apt_pkg Python module fix for Ubuntu 22.04 (common symlink issue)
- Added automatic sudo npx playwright install-deps execution on Linux during sk init
- Browser binaries are now properly installed AND system dependencies are configured
- Impact: E2E tests, A11y tests, and Lighthouse CI now work out-of-the-box on fresh Linux VMs
- Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine) at tier-3+
Lighthouse CI Chrome Detection
- Fixed "Chrome installation not found" error in Lighthouse CI on local/VM environments
- Added scripts/lighthouse.sh wrapper that auto-detects Chrome/Chromium location
- On GitHub Actions: Uses pre-installed Chrome at /usr/bin/google-chrome-stable
- On local/VM: Falls back to Playwright's Chromium if system Chrome not found
- No longer requires manual Chrome installation for local development
- Affects: All Next.js stacks at tier-4-production
ESLint Configuration Deprecation Warning
- Fixed ".eslintignore file is no longer supported" warning in ESLint 9+
- Migrated all ignore patterns from .eslintignore to eslint.config.mjs ignores array
- Removed deprecated .eslintignore files from all tier-1-essential templates
- Added comprehensive ignore patterns: playwright-report, test-results, .stryker-tmp, .lighthouseci, etc.
- Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine)
Template File Formatting
- Fixed Prettier formatting issues in template files
- Re-formatted all template files using project's .prettierrc config (printWidth: 100)
- Previously formatted with default Prettier settings causing format check failures
- Affects: All stacks
Test Script Lighthouse CI Support
- Added lighthouse.yml workflow parsing to test_all_templates.py
- Lighthouse CI checks now run for tier-4-production projects regardless of ci_cd option
- Fixed early return condition that skipped workflow checks when ci_cd option not selected

Added

README Documentation for E2E, A11y, and Lighthouse
- Added Accessibility Testing section to README for projects with a11y option
- Added Lighthouse CI section to README for tier-4-production projects
- Documents that Playwright's Chromium is used automatically for Lighthouse
New User VM Test Guide
- Added comprehensive testing guide: analysis-docs/NEW_USER_VM_TEST_GUIDE.md
- Step-by-step instructions for testing Solokit on fresh GCP VMs
- Automated test script: scripts/test-new-user-experience.sh
- Tests all 4 stacks with tier-4 and all options
Phase 4 Test Failures for Next.js Stacks
- Fixed fullstack_nextjs mutation test failures by creating tier-specific Jest environment configurations
- Added tier-3 test file overrides using @stryker-mutator/jest-runner/jest-env/node for API route tests
- Kept tier-1/tier-2 test files with standard @jest-environment node for compatibility
- Fixed Lighthouse CI workflow placement: moved from a11y option to tier-4-production (where script exists)
- Created dedicated lighthouse.yml workflows in tier-4-production for all Next.js stacks
- Added PORT environment variable support to Playwright configs for parallel test execution
- Updated test script to provide DATABASE_URL environment variable for Next.js dev servers
- Optimized test timeouts: mutation tests (600s), regular tests (300s), default (120s)
- Impact: All 192 phase-4 tests now passing across all stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi)
- Affects: fullstack_nextjs, saas_t3, dashboard_refine (all tiers with ci_cd option)
- Files modified:
  - Template files: 14 files (playwright configs, test files, workflow files)
  - Test infrastructure: test_all_templates.py (environment and timeout configuration)
Tier-Aware CI/CD Workflows for ml_ai_fastapi
- Made CI/CD workflows respect tier-based tool availability to prevent false failures
- Added conditional execution for pylint duplicate code check (tier-3+ only) using if: hashFiles('.pylintrc') != ''
- Added conditional execution for cosmic-ray mutation tests (tier-3+ only) via config detection step
- Added conditional execution for Bandit security linting (tier-2+ only) via config detection step
- Updated test script to evaluate GitHub Actions hashFiles() expressions and skip steps with failing conditions
- Updated test script to skip conditional check steps (e.g., "Check if cosmic-ray config exists")
- Added --stack option to test script for running all phase-4 tests for a specific stack (48 tests per stack)
- Impact: Prevents tier-1 and tier-2 tests from failing due to missing tier-3 quality tools
- Affects: ml_ai_fastapi (all tiers with ci_cd option)
- Fixed 24 out of 48 ml_ai_fastapi phase-4 test failures

Added

Code Duplication Detection for Python Stack (Session 4)
- Added pylint 3.3.3 with duplicate-code checking to ml_ai_fastapi tier-3 dependencies
- Created .pylintrc configuration for code duplication thresholds
- Added duplication check step to ml_ai_fastapi quality-check.yml workflow
- Updated stack-versions.yaml with pylint version and installation command
- Updated test script to recognize and run pylint commands
- Impact: All 4 stacks now have consistent code duplication detection at tier-3+
- JavaScript stacks: jscpd, Python stack: pylint
- Resolves: Session 4 (Code Duplication Detection) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Type Coverage Enforcement for Python Stack (Session 5)
- Added mypy type coverage check to ml_ai_fastapi quality-check.yml workflow
- Configured mypy with --disallow-untyped-defs and --disallow-incomplete-defs flags
- Updated test script to recognize and run mypy commands
- Impact: All 4 stacks now enforce type coverage at tier-3+
- JavaScript stacks: type-coverage tool (95%), Python stack: mypy strict checking
- Resolves: Session 5 (Type Coverage Enforcement) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Comprehensive Unit Tests for Python Stack (Session 6)
- Added 29 comprehensive unit tests achieving 94.54% coverage (up from 75.63%)
- Created test_api_routes.py: Tests for API endpoints and health checks with error scenarios
- Created test_database.py: Tests for database connections and dependency injection with mocking
- Created test_main.py: Tests for application startup, lifespan, and API documentation endpoints
- Coverage breakdown: Dependencies (100%), Database (100%), Main app (100%), Models (100%), Services (100%)
- Impact: Any coverage threshold (60%, 80%, 90%) selected during initialization will now pass
- Resolves: Session 6 (Unit Tests with Coverage) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Integration Tests Verification (Session 7)
- Verified all 4 stacks have integration tests properly configured in tier-3-comprehensive
- Confirmed integration test scripts in package.json/pytest for all stacks
- Verified test.yml workflows run integration tests without continue-on-error flags
- JavaScript stacks: Jest/Vitest with integration test directories
- Python stack: pytest with real HTTP client integration tests
- Impact: All stacks at tier-3+ have working integration tests in CI
- Resolves: Session 7 (Integration Tests) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md

Fixed

E2E Tests Failing in fullstack_nextjs (Session 8)
- Fixed PrismaClientInitializationError during E2E tests caused by database queries in Server Components
- Modified app/page.tsx to gracefully handle database connection errors with try-catch
- Added fallback data for E2E test environment when database is unavailable
- Fixed TypeScript type errors in lib/tests/prisma.test.ts using Object.defineProperty
- Impact: E2E tests now pass without requiring database setup, matching saas_t3/dashboard_refine patterns
- Root cause: fullstack_nextjs uses Server Components with direct Prisma queries (unlike other stacks)
- Affects: fullstack_nextjs template, all tiers
- Resolves: Session 8 (E2E Tests) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Mutation Testing Configuration Consistency (Session 9)
- Fixed ml_ai_fastapi mutation testing to match JavaScript stack tiering pattern
- Replaced mutmut with Cosmic Ray 8.4.3 (mutmut incompatible with src/ directory layouts)
- Removed [cosmic-ray] configuration from base, tier-1, and tier-2 templates
- Updated tier-3 and tier-4 pyproject.toml to use cosmic-ray==8.4.3 in quality dependencies
- Deleted obsolete mutmut_config.py.template from tier-3-comprehensive
- Updated CI/CD workflow to use cosmic-ray commands (cosmic-ray init/exec/cr-report)
- Updated stack-versions.yaml with cosmic-ray 8.4.3
- Verified cosmic-ray session creation (200 mutation jobs) on test projects
- Added tests pattern to Jest testMatch in 6 JavaScript configs for better test discovery
- Enhanced saas_t3 test coverage to 92.85% function coverage
- Impact: Mutation testing now introduced in tier-3, inherited by tier-4 (consistent across all stacks)
- Affects: ml_ai_fastapi (all tiers), saas_t3/dashboard_refine/fullstack_nextjs (tier-3/tier-4)
- Resolves: Session 9 (Mutation Testing) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Production Build Quality Gate (Session 10)
- Added explicit production build step to quality-check.yml for all 3 Next.js stacks
- Production build now runs as final quality gate before PR merge
- Updated test script to handle build steps context-aware (skip in setup jobs, run in quality jobs)
- Enhanced npm command parsing to handle all npm commands (not just npm run)
- Impact: Build failures now caught during quality checks, preventing broken production builds
- Affects: saas_t3, dashboard_refine, fullstack_nextjs (all tiers with CI/CD option)
- Resolves: Session 10 (Production Build) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Bundle Analysis Integration (Session 11)
- Integrated @next/bundle-analyzer in next.config.ts for all tier-4 Next.js stacks
- Bundle analyzer enabled via ANALYZE=true environment variable
- Bundle analysis job added to build.yml workflow (uploads artifacts for 30 days)
- Updated test script to parse security.yml and build.yml workflows
- Impact: All tier-4 production templates now have bundle size monitoring in CI/CD
- Affects: saas_t3/tier-4-production, dashboard_refine/tier-4-production, fullstack_nextjs/tier-4-production
- Resolves: Session 11 (Bundle Analysis) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Security Scanning Enforcement (Session 12)
- Removed continue-on-error flags from all security checks across all 4 stacks
- JavaScript stacks: npm audit and dependency-review-action now fail CI on vulnerabilities
- Python stack: Bandit, pip-audit, and Semgrep now fail CI on security issues
- Fixed .bandit configuration syntax from INI/Python hybrid to proper YAML format
- Added .eslintignore files to all JavaScript stacks to exclude generated report files
- Updated .prettierignore to include report/ directory in all JavaScript stacks
- Removed duplicate quality checks (type check, lint) from build.yml in all Next.js stacks
- Added security.yml workflow to test script parsing
- Impact: Security vulnerabilities now block CI/CD pipeline instead of being warnings
- Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), all tiers with CI/CD
- Resolves: Session 12 (Security Scanning) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Accessibility Testing Infrastructure (Session 13)
- Tagged all accessibility tests with @a11y marker for proper test discovery
- saas_t3: 1 test tagged (home.spec.ts)
- fullstack_nextjs: 1 test tagged (flow.spec.ts)
- dashboard_refine: 3 tests tagged (dashboard.spec.ts, user-management.spec.ts)
- npm run test:a11y now correctly finds and runs all 5 accessibility tests
- Tests use @axe-core/playwright to scan for WCAG 2.0/2.1 Level A & AA violations
- Impact: Accessibility testing workflow now functional for all Next.js stacks
- Affects: saas_t3, dashboard_refine, fullstack_nextjs (tier-3+, a11y option)
- Resolves: Session 13 (Accessibility Testing) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Lighthouse CI Configuration & Build Workflow Optimization (Session 14)
- Removed overly strict "lighthouse:recommended" preset from Lighthouse configuration
- Now uses explicit assertions for meaningful metrics: 90% category scores + Core Web Vitals
- Increased LCP threshold for dashboard_refine to 3500ms (accounts for Refine framework overhead)
- Removed duplicate build job from build.yml workflow in all Next.js stacks
- Simplified build.yml to only bundle-analysis job (builds independently with ANALYZE=true)
- Added Prisma client generation step to quality-check.yml for saas_t3 and fullstack_nextjs
- Production build now solely in quality-check.yml as proper quality gate
- Impact: Cleaner CI/CD workflows, no redundant builds, Lighthouse focuses on practical metrics
- Affects: saas_t3, dashboard_refine, fullstack_nextjs (tier-4-production, a11y option)
- Resolves: Session 14 (Lighthouse CI) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Complexity Analysis Enforcement (Session 15)
- Added ESLint complexity rules to all JavaScript stack tier-3 and tier-4 configs
- JavaScript complexity rules: cyclomatic (max 10), max-depth (4), max-nested-callbacks (4), max-lines-per-function (100)
- Test files exempted from max-nested-callbacks and max-lines-per-function (describe/it blocks naturally nest deeply)
- Updated Python Radon check to enforce thresholds: radon cc --max B fails build if complexity exceeds grade B
- ESLint handles both linting and complexity for JavaScript (industry standard approach)
- Radon provides separate complexity analysis for Python (distinct from ruff linting)
- Impact: All stacks at tier-3+ now enforce code complexity standards to maintain readability
- JavaScript: Complexity violations show as ESLint errors during npm run lint
- Python: Complexity violations fail radon cc step in quality-check.yml workflow
- Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), tier-3+
- Resolves: Session 15 (Complexity Analysis) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Dead Code Detection (Session 16)
- Added ts-prune check to all JavaScript stack quality-check.yml workflows
- JavaScript: npm run check:unused runs ts-prune to detect unused exports
- ts-prune already installed in tier-3 package.json (version 0.10.3) with check:unused script
- Updated Python Vulture configuration to reduce false positives
- Increased Vulture min-confidence from 80 to 90 (fewer false positives)
- Added explicit excludes for tests, pycache, and alembic/versions
- Added ignore-names for common protocol parameters (method_name, logger) required by frameworks
- Fixed test script to use shlex.split() instead of str.split() for proper handling of quoted arguments
- Impact: All stacks at tier-3+ now detect and prevent dead/unused code
- JavaScript: ts-prune detects unused exports during quality checks
- Python: Vulture detects unused code in src/ with 90% confidence threshold
- Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), tier-3+
- Resolves: Session 16 (Dead Code Detection) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Template Registry Documentation Update (Session 18)
- Updated template-registry.json to accurately reflect all implemented quality checks
- Tier-3 comprehensive now documents all tools: ts-prune, jscpd, vulture, pylint, Radon, cosmic-ray
- Tier-4 production now includes: Bundle analysis, Lighthouse CI, structured logging
- Added stack_specific sections to clarify JavaScript vs Python tooling differences
- Updated tier-3 to specify "E2E tests (Playwright for JS stacks)" - Python stack uses integration tests
- Updated tier-4 to document actual features: Bundle analysis (@next/bundle-analyzer), Lighthouse CI (90% scores)
- Changed tier-4 description from "Operations + Deployment" to "Operations + Monitoring + Performance"
- Updated metadata last_updated date to 2025-11-19
- Impact: Documentation now matches implementation, users know exactly what tools are used at each tier
- Affects: template-registry.json (user-facing documentation)
- Resolves: Session 18 (Update Template Registry Documentation) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md

Fixed (from previous sessions)

Template Type Check Failures Across All Stacks
- Fixed dashboard_refine: Refine v5 Pagination API changed from current to currentPage
  - Updated lib/__tests__/refine.test.tsx to use correct Pagination interface
- Fixed fullstack_nextjs: TypeScript read-only property errors in test environment setup
  - Updated lib/__tests__/prisma.test.ts to use type assertions for process.env.NODE_ENV
- Fixed ml_ai_fastapi: Pyright unable to resolve imports (missing venv configuration)
  - Added venvPath and venv settings to pyrightconfig.json
- Fixed import ordering issues in ml_ai_fastapi template files (13 files)
- Impact: All CI/CD type checks now pass for tier-4-production across all stacks
- Affects: All templates (dashboard_refine, fullstack_nextjs, saas_t3, ml_ai_fastapi), all tiers
- Resolves: Session 2 (Type Checking) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Pre-commit Option Conflict with Tier-2+ Templates
- Removed redundant "Pre-commit" additional option from initialization
- Tier-2+ templates already include Husky git hooks built-in (.husky/pre-commit)
- Python pre-commit framework option was creating duplicate hook systems
- Deleted all template pre-commit/ directories (saas_t3, ml_ai_fastapi, fullstack_nextjs, dashboard_refine)
- Updated init.py to show only 3 additional options: CI/CD, Docker, Env Templates
- Updated template-registry.json and template_installer.py
- Updated all documentation and command files to reflect the change
- Updated all test files to remove pre_commit from test cases
- Impact: Cleaner user experience, no conflicting hook systems, tier-1 users can upgrade to tier-2 for git hooks
- Affects: All templates, all tiers
- Rationale: Tier-2+ already has Husky (90% of users), tier-1 users can upgrade to tier-2, advanced users can manually install Python pre-commit if needed
Critical Tier-3 and Tier-4 Test Suite Failures
- Fixed Jest configuration to exclude Playwright e2e tests from Jest runs (6 files)
  - Added testMatch to only run unit and integration tests
  - Added testPathIgnorePatterns to exclude /tests/e2e/
  - Added transformIgnorePatterns for ESM dependencies (superjson, @trpc)
  - Added moduleNameMapper for path aliases
  - Created dedicated jest.config.ts files for tier-3 and tier-4 templates
- Fixed package.json test scripts to separate test types (6 files)
  - Added test:unit - Run unit tests only
  - Added test:integration - Run integration tests only
  - Added test:e2e - Run Playwright e2e tests only
  - Added test:all - Run all test types sequentially
  - Updated tier-3 and tier-4 package.json templates for all stacks
- Replaced broken integration test examples with working placeholders (3 files)
  - Removed ESM server imports that caused "Cannot use import statement outside a module" errors
  - Added educational placeholder tests demonstrating proper structure
  - Tests now pass immediately after project initialization
- Impact: All tier-3 and tier-4 projects now have working test suites out of the box
- Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Resolves: "TransformStream is not defined" and ESM import errors

Added

Urgent Flag for Single Immediate-Priority Work Items
- Added --urgent flag to sk work-new command for marking work items that require immediate attention
- Exclusive single-item constraint: only ONE work item can be urgent at a time
- User confirmation prompt when setting a new urgent item (clears existing urgent flag)
- Urgent items override all priority levels and dependency ordering
- Visual ⚠️ indicator in sk work-list output for urgent items
- sk work-next always returns urgent items first, ignoring dependencies
- Added --clear-urgent flag to sk work-update command for manual clearing
- Auto-clear urgent flag when work item status changes to completed
- Added urgent status question to /work-new slash command (interactive UI)
- Backward compatible: work items without urgent field default to false
- Added 24 unit tests (repository, scheduler, updater) with 90%+ coverage
- Added 11 integration tests for end-to-end urgent workflow
- Updated command documentation: work-new.md, work-update.md, work-list.md, work-next.md
Essential CLI Commands: help, version, doctor, config show
- Added sk help command to display all commands organized by category
  - sk help <command> shows detailed help for specific commands with usage, options, and examples
  - Global --help and -h flags supported
  - Created src/solokit/commands/help.py with comprehensive command documentation
- Added sk version command to display version information
  - Shows Solokit version, Python version, and platform
  - Global --version and -V flags supported
  - Created src/solokit/commands/version.py
- Added sk doctor command for comprehensive system diagnostics
  - Checks Python version (>= 3.11.0), git installation, project structure
  - Validates config.json and work_items.json integrity
  - Verifies quality tools availability (pytest, ruff)
  - Provides actionable suggestions for failed checks
  - Returns exit code 0 if all pass, 1 if any fail
  - Created src/solokit/commands/doctor.py
- Added sk config show command to display configuration
  - Shows config file path and formatted configuration
  - Validates configuration and displays status
  - --json flag for machine-readable output
  - Created src/solokit/commands/config.py
- Updated CLI routing in src/solokit/cli.py to support new commands
- Running sk with no arguments now shows help (instead of error)
- Added 27 new unit tests for all commands (100% passing)
- Updated README.md with utility commands section
- Updated docs/guides/troubleshooting.md to reference sk doctor as first troubleshooting step

Fixed

High: Urgent Flag Not Cleared on Session Completion
- Fixed urgent flag persisting on completed work items when using sk end command
  - Session completion now uses WorkItemUpdater instead of direct JSON manipulation
  - Ensures urgent flag is automatically cleared when work item status changes to completed
  - Behavior now consistent with sk work-update <id> --status completed
  - Updated src/solokit/session/complete.py to use repository pattern for status updates
- Added --set-urgent flag to sk work-update command for setting urgent status
  - Allows promoting existing work items to urgent status
  - Automatically clears urgent flag from other items (single-item constraint)
  - Complements existing --clear-urgent flag for complete CLI control
  - Updated src/solokit/work_items/updater.py with set_urgent field handling
- Updated help documentation for urgent flags
  - Added --set-urgent and --clear-urgent to work-update command help
  - Added --urgent flag to work-new command help examples
  - Updated src/solokit/commands/help.py with complete option descriptions
  - Updated .claude/commands/work-update.md and template version
- Added integration test for urgent flag clearing on session completion
  - New test: test_auto_clear_urgent_on_session_completion in test_urgent_workflow.py
  - Verifies end-to-end workflow with session completion
  - All 12 urgent workflow integration tests passing
- Impact: Work lists now correctly show/hide ⚠️ symbol based on actual urgent status
- Users no longer need manual cleanup after completing urgent work items
- Complete CLI support for urgent flag lifecycle (create, set, clear, auto-clear)
Critical: Next.js 16 Template Initialization Issues
- Fixed missing ts-node dependency causing Jest to fail parsing TypeScript config files
  - Added "ts-node": "10.9.2" to devDependencies in all 15 Next.js package.json templates
  - Affects all 3 Next.js templates (saas_t3, fullstack_nextjs, dashboard_refine) × 5 tiers each
  - Resolves error: Jest: 'ts-node' is required for the TypeScript configuration files
- Fixed deprecated next lint command removed in Next.js 16
  - Changed "lint": "next lint" to "lint": "eslint ." in all 15 templates
  - Updated "lint:fix" script in dashboard_refine templates to use direct ESLint
  - Resolves cryptic error: Invalid project directory provided, no such directory: .../lint
- Fixed ESLint 9 incompatibility with legacy config format
  - Replaced .eslintrc.json (legacy) with eslint.config.mjs (flat config) in all 3 templates
  - Added "globals": "16.5.0" package to tier-1-essential in all 3 templates
  - Configured proper globals for Node.js, browser, React, and Jest environments
  - Resolves error: ESLint couldn't find an eslint.config.(js|mjs|cjs) file
- Fixed linting validation being skipped during quality gates check
  - Updated src/solokit/init/session_structure.py to include linting commands in quality gates config
  - Added commands section with language-specific linting commands (python, javascript, typescript)
  - Validation now properly runs npm run lint instead of reporting "no command for typescript"
- Fixed linting errors in template example code
  - Removed unused ctx parameters from tRPC example router (saas_t3 template)
  - Template code now passes linting without errors after initialization
- Impact: All 3 Next.js templates now work correctly across all quality tiers (base through tier-4)
- Users can successfully initialize projects without manual workarounds
- Quality gates validation (/validate, /end) now properly check linting instead of skipping
- Linting works out-of-the-box with ESLint 9 flat config
- All 2,936 tests passing with zero regressions
Critical: CI/CD Workflow Failures in Template Projects
- Fixed CodeQL permission error causing Security workflow to fail on push to main
  - Added actions: read permission to CodeQL jobs in all template security.yml files
  - Resolves error: "Resource not accessible by integration" when accessing workflow metadata
  - Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Fixed CodeQL and secrets-scan jobs running on pull requests without required permissions
  - Added if: github.event_name != 'pull_request' conditional to skip on PRs
  - These jobs require write permissions not available in PRs from forks
  - Prevents workflow failures on external contributions
  - Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Fixed dependency-review failing on repositories without GitHub Advanced Security
  - Added continue-on-error: true to dependency-review step
  - Allows workflow to pass even when Advanced Security is not available (free repositories)
  - Resolves error: "Dependency review is not supported on this repository"
  - Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Fixed Deploy workflow failures when production secrets are not configured
  - Added conditionals to skip deployment steps when secrets are empty/missing
  - Database migrations: if: ${{ secrets.DATABASE_URL != '' }}
  - Vercel deployment: if: ${{ secrets.VERCEL_TOKEN != '' }}
  - Sentry releases: if: ${{ secrets.SENTRY_AUTH_TOKEN != '' }}
  - Lighthouse CI: if: ${{ secrets.LHCI_GITHUB_APP_TOKEN != '' }}
  - Python templates: STAGING_DATABASE_URL, RAILWAY_TOKEN, DOCKER_REGISTRY, DEPLOY_KEY
  - Affects: saas_t3, fullstack_nextjs, dashboard_refine, ml_ai_fastapi templates
- Fixed missing npm script errors in test and build workflows
  - Changed to npm run --if-present test:integration for integration tests
  - Changed to npm run --if-present test:e2e for E2E tests
  - Changed to npm run --if-present analyze for bundle analysis
  - Scripts gracefully skip if not defined in package.json (tier-1/tier-2 projects)
  - Resolves errors: "Missing script: test:integration/test:e2e/analyze"
  - Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Impact: New projects can now merge PRs without CI failures
- All CI workflows pass on tier-1-essential projects (base configuration)
- Deploy workflows gracefully skip steps when production infrastructure isn't configured yet
- Users can set up production secrets and advanced test suites incrementally without errors
- Fixed 11 workflow files across 4 templates (security.yml, deploy.yml, test.yml, build.yml)
Critical: Phase 2 Terminal Testing - Final 11 UX Issues (All 18 Issues Now Complete)
- Fixed .session/ directory causing uncommitted changes warnings (#9 - Critical)
  - Added .session/ to .gitignore in all 4 stack templates (saas_t3, ml_ai_fastapi, dashboard_refine, fullstack_nextjs)
  - Templates now properly exclude session tracking from git by default
- Fixed DOT syntax error in work-graph SVG generation (#4/#5 - Critical)
  - Changed from invalid "bold, color=red" to valid DOT syntax 'style=bold, color=red'
  - Updated src/solokit/visualization/dependency_graph.py:169
  - SVG graph generation now works correctly with Graphviz
- Changed uncommitted changes from ERROR to INFO level in sk start (#8 - High)
  - Updated src/solokit/session/briefing/git_context.py to handle WorkingDirNotCleanError gracefully
  - Users no longer see ERROR logs for normal uncommitted changes during development
- Added progress messaging and Claude Code promotion to sk init (#1 - High)
  - Added initial progress message during initialization
  - Changed final messages to use output.info() instead of logger.info() for visibility
  - Updated src/solokit/init/orchestrator.py with user-facing completion summary
- Added warning when dependency already exists in work-update (#2 - Medium)
  - Shows output.warning("Dependency 'X' already exists (skipped)") instead of silently skipping
  - Updated src/solokit/work_items/updater.py
- Replaced verbose output with compact table format in work-next (#6 - Medium)
  - New table shows ID, Type, Priority, Status, Blocks, and Title columns
  - Displays top 5 ready items and top 3 blocked items
  - Arrow (→) marks recommended item, updated src/solokit/work_items/scheduler.py
- Added interactive prompt to work-delete when no flags provided (#12 - Medium)
  - Users now get choices: 1=keep spec, 2=delete spec, 3=cancel
  - No longer requires --keep-spec or --delete-spec flags (but still accepts them)
  - Updated src/solokit/work_items/delete.py with user-friendly menu
- Removed redundant ERROR/WARNING logs in edge cases (#14/#15/#16 - Medium)
  - Removed duplicate logging before user-facing error messages
  - Updated query.py, updater.py, and delete.py to avoid log duplication
  - Changed "No changes made" to "No changes to update" for clarity
- Updated work-graph to use HelpfulArgumentParser for better errors (#17 - Medium)
  - Invalid format errors now show examples instead of raw argparse output
  - Updated src/solokit/visualization/dependency_graph.py
- Improved "no results" message in learn-search (#11 - Low)
  - Now suggests trying different keywords or browsing all learnings
  - Updated src/solokit/learning/reporter.py
- Added validation for empty query in learn-search (#18 - Low)
  - Shows error with examples when query is empty or whitespace-only
  - Updated src/solokit/learning/curator.py
- Test updates: Fixed 1 test in test_briefing_generator.py to match new git status message
- All 2,388 tests passing with zero regressions
- Quality checks: All ruff linting passed, all formatting compliant, all mypy checks passed
- Impact: Completes all 18 Phase 2 terminal testing issues for professional CLI UX
Critical: Phase 2 Terminal Testing - Clean Output, Archiver Fix, and Briefing Improvements
- Fixed log leakage issue where INFO/WARNING/ERROR logs appeared in all commands without --verbose flag
  - Changed default CLI log level from INFO to ERROR for clean terminal output
  - Removed redundant logging configuration from validate.py
  - Updated src/solokit/cli.py to set ERROR level by default, DEBUG with --verbose
  - Only ERROR and above messages shown to users unless explicitly requesting verbose mode
- Fixed archiver type comparison error causing learning curation to fail
  - Updated src/solokit/learning/archiver.py to handle new session dict format
  - Changed from comparing dict objects directly to extracting session_num field first
  - Resolves '>' not supported between instances of 'dict' and 'int' error
- Fixed work-list count logic to include blocked items in not_started category
  - Updated src/solokit/work_items/query.py to count items by actual status
  - Blocked is now correctly treated as a property, not a separate status
  - Count math now accurate: total = in_progress + not_started + completed
- Added template comment stripping to briefing output for cleaner specs
  - Created strip_template_comments() method in src/solokit/session/briefing/formatter.py
  - Removes HTML comments, placeholder text, and excessive blank lines from specs
  - Briefings now ~5x shorter and more readable without template cruft
- Verified work-graph documentation already matches implementation (ascii, dot, svg formats)
- Added comprehensive regression test suite: tests/integration/test_phase_2_terminal_fixes.py
  - 15 new tests covering all 5 issues
  - Updated 5 existing tests to use new session dict format
- All 2,388 tests passing with zero regressions
- Quality checks: All ruff linting passed, all formatting compliant, all mypy checks passed
- Impact: Resolves 5 critical Phase 2 terminal testing issues for professional CLI UX
Critical: Phase 1 Terminal Testing - Error Messaging & UX Improvements
- Fixed missing jsonschema>=4.20.0 dependency causing all learning commands to fail
- Enhanced argparse error messages with helpful examples and next steps:
  - Created src/solokit/core/argparse_helpers.py with HelpfulArgumentParser class
  - Updated sk work-new, sk work-show, sk work-update, sk work-delete with example-rich epilogs
  - All argparse errors now show full help text with examples instead of raw usage
- Improved Python binary detection for cross-platform compatibility:
  - Created src/solokit/core/system_utils.py with get_python_binary() function
  - Updated get_metadata.py, get_next_recommendations.py, get_dependencies.py to detect python vs python3
  - Error messages now show correct binary based on system availability
- Added --debug flag to sk validate to hide stack traces from end users by default
- Implemented context-aware "no work item" error messages:
  - sk start: Differentiates between "no items exist" vs "items exist but blocked"
  - sk status: Shows total item count and actionable next steps
  - sk end: Provides complete workflow guidance instead of "Work item not found: None"
  - sk work-next: Helpful creation steps instead of generic "No work items found."
  - sk work-list: Better message instead of wrong command reference "/work-item create"
  - sk work-graph: Context-aware message differentiating no items vs filtered results
- All error messages now include:
  - Numbered action steps for both terminal (sk commands) and Claude Code (slash commands)
  - Emoji hints (⚠️, 💡) for visual guidance
  - Specific next steps instead of generic warnings
- Test updates: Fixed 1 test in test_status.py to match improved error messages
- All 2,155 unit tests passing
- Impact: Resolves 13 out of 19 Phase 1 terminal testing issues

Added

Feature: UX Enhancements - Logger Shortening, Interactive Prompts, and Claude Code Promotion
- Shortened logger names for better terminal readability (e.g., "orchestrator" vs "solokit.init.orchestrator")
- Added questionary library for rich interactive CLI prompts with styled UI components
- Created src/solokit/core/cli_prompts.py utility module with 4 reusable functions:
  - confirm_action(): Styled confirmation prompts with default fallback
  - select_from_list(): Single-select lists with arrow key navigation
  - multi_select_list(): Multi-select checkboxes for multiple options
  - text_input(): Text input with optional validation and defaults
- Replaced basic input() calls in src/solokit/project/init.py with questionary prompts:
  - Template selection now uses interactive list selection
  - Quality tier selection with rich descriptions
  - Coverage target selection with visual list
  - Additional options use multi-select checkboxes
  - Final confirmation with styled yes/no prompt
- Added Claude Code promotion to initialization completion:
  - Prominent messaging after sk init completes
  - Lists key slash commands (/start, /end, /work-new, /work-list)
  - Includes link to https://claude.com/claude-code
  - Better flow: Claude Code promotion → Next Steps
- Enhanced README.md with Claude Code positioning:
  - Added "💡 Best Used with Claude Code" hero section with Quick Start variant
  - Enhanced Prerequisites to strongly recommend Claude Code (not just required)
  - Added "vs. Using Claude Code Standalone" comparison explaining workflow benefits
  - Repositioned documentation to emphasize Claude Code as primary interface
- All prompts gracefully fall back to defaults in non-interactive environments (CI/CD, piped stdin)
- Added EOF/KeyboardInterrupt error handling for robust test execution
- Test suite: 2,373 tests passing (added 17 new tests for cli_prompts module)
- Quality: All ruff linting passed, all mypy checks passed with modern type annotations

Fixed

Quality: Complete code quality and test suite cleanup
- Fixed all linting issues: Replaced deprecated typing.List with built-in list type in 3 template files
- Fixed all mypy type errors (17 errors across 6 files):
  - Updated pyproject.toml: Replaced deprecated strict_concatenate with extra_checks
  - Fixed exceptions.py: Changed implicit Optional returncode: int = None to explicit int | None = None
  - Added type casting in template_installer.py and dependency_installer.py for json.load() and yaml.safe_load() returns
  - Enhanced return type in environment_validator.py: dict[str, bool | str] → dict[str, bool | str | None | list[str]]
  - Added Literal type casting in orchestrator.py for stack_type and tier parameters
- Fixed test failures (3 tests):
  - Fixed mock fixtures: Changed exit_code to returncode in 6 test mocks
  - Updated conftest.py: Aligned mock_stack_versions with actual stack-versions.yaml structure (base, tier1-4 instead of all_tiers/tier4)
- Removed all legacy init tests (12 tests deleted):
  - Deleted TestGitignoreGeneration class (8 tests) from test_init_workflow.py
  - Deleted TestGitInitialization class (3 tests) from test_init_workflow.py
  - Deleted TestCompleteInitWorkflow test (1 test) from test_init_workflow.py
- Fixed E2E test fixtures to avoid legacy init (25 tests un-skipped):
  - Updated fixtures in test_core_session_workflow.py, test_learning_system.py, test_work_item_system.py
  - Fixtures now manually create .session directory structure instead of calling deprecated sk init
  - Added all required tracking files with proper structure (work_items.json, learnings.json, status_update.json, stack.txt, tree.txt)
- Test suite results: 2,954 tests passing, 0 failed, 0 skipped (previously 2,368 passing, 35 skipped)
- Quality checks: All ruff linting passed, all 247 files formatted, all mypy checks passed (106 source files)
- Benefits: Clean codebase with modern Python type hints, zero legacy code, 100% test success rate

Added

Feature: Claude Code Interactive UI Integration
- Integrated Claude Code's AskUserQuestion tool to replace Python's interactive terminal prompts with rich UI components
- Updated 6 slash commands with interactive workflows:
  - /work-new: Interactive dependency and metadata selection with AI-powered suggestions
  - /work-update: Multi-select field updates (status, priority, milestone, dependencies)
  - /work-delete: Shows dependent work items with warning before deletion
  - /end: Work item completion status selection (completed/in-progress/cancel)
  - /learn: AI-generated learning suggestions with multi-select capture
  - /start: Interactive work item recommendations (top 4 ready items by priority)
- Created 4 optimization scripts to avoid reading full JSON files:
  - get_metadata.py: Fast work item metadata retrieval (~10 lines vs 1,751 lines)
  - get_dependencies.py: Quick dependency lookup with filtering and status
  - get_dependents.py: Find work items that depend on a given item
  - get_next_recommendations.py: Get top N ready work items by priority
- Removed all Python input() calls from command modules (creator.py, updater.py, delete.py, complete.py)
- All commands now require explicit CLI arguments with no interactive fallbacks
- Updated command files (.claude/commands/*.md) with declarative AskUserQuestion workflows
- Added 53 comprehensive unit tests for optimization scripts
- All 2,226 tests passing (1,996 unit + 140 integration + 90 e2e)
- Full type safety maintained with mypy strict mode
- Benefits: Rich interactive UI for Claude Code users, better UX with multi-select options, AI-generated suggestions, optimized performance

Changed

Session Completion: /sk:end now defaults to marking work items as completed
- Non-interactive mode (e.g., when run by Claude Code) now defaults to marking work items as "completed" instead of "in-progress"
- This aligns with the most common use case where developers end sessions after completing their work
- Use the --incomplete flag explicitly to keep work items as "in-progress" for multi-session work
- Interactive mode behavior unchanged (still defaults to completed as choice 1)
- Updated src/solokit/session/complete.py:943 to return True in non-interactive mode
- Updated documentation in .claude/commands/end.md to reflect new default behavior
- Updated test test_prompt_non_interactive_defaults_true in tests/unit/session/test_complete.py

Added

Performance: Comprehensive optimization for session operations
- Created src/solokit/core/cache.py with thread-safe TTL-based caching:
  - Cache class with get/set/invalidate/clear operations
  - FileCache class with automatic modification time tracking
  - Global cache instance accessible via get_cache()
- Created src/solokit/core/performance.py for performance monitoring:
  - @measure_time() decorator for automatic function timing
  - Timer context manager for code block timing
  - Automatic logging for operations >100ms (info) and >1s (warning)
- Enhanced src/solokit/learning/similarity.py with caching optimizations:
  - Added _word_cache to cache word sets during merge operations
  - Pre-compute word sets once per category (O(n) instead of O(n²))
  - Reduced similarity checking from 4,950 operations to ~100 for 100 learnings
- Enhanced src/solokit/work_items/repository.py with file caching:
  - load_all() uses FileCache with modification tracking
  - Eliminates 11+ repeated file loads per operation
  - save_all() automatically invalidates cache
- Added 91 comprehensive tests:
  - 16 cache module tests (TTL, thread safety, file caching)
  - 13 performance module tests (decorator, timer, exception handling)
  - Enhanced similarity tests with word cache validation
- Performance improvements:
  - Similarity checking: 30-50x faster for large learning datasets
  - File I/O: 10x reduction with intelligent caching
  - Automatic performance monitoring built-in across codebase
- All 1,980 unit tests passing, full type safety with mypy strict mode

Changed

Refactor: Extract constants and remove magic values - Complete centralization
- Created comprehensive src/solokit/core/constants.py module with 31 constants organized into 9 categories
- Replaced 50+ magic timeout values and hardcoded path strings across 27 files with named constants
- Added 8 helper functions for type-safe path construction (e.g., get_session_dir(), get_work_items_file())
- Organized constants into logical categories:
  - Git operation timeouts (3): Quick/Standard/Long (5s/10s/30s)
  - Quality gate timeouts (5): From 5s checks to 20min test runs
  - Integration testing timeouts (5): Docker, fixtures, cleanup operations
  - Session workflow timeouts (4): Status, completion, learning extraction
  - Project initialization timeouts (3): Stack detection, tree/graph generation
  - Performance testing (4): Regression thresholds, test timeouts
  - Learning system (5): Curator settings, similarity thresholds
  - Directory and file paths (11): Session directory structure
- Updated files across all major modules:
  - Core: git/integration.py (13 replacements), session/validate.py
  - Quality: All 8 checker modules + gates.py (22 replacements)
  - Session: complete.py, status.py, briefing modules (8 replacements)
  - Testing: performance.py, integration_runner.py (9 replacements)
  - Other: learning, visualization, project modules (4 replacements)
- All constants use Final type annotations for type safety
- Benefits: Single source of truth, self-documenting code, easier maintenance, improved readability
- All 2,180 tests passing, zero linting issues, clean formatting
Refactor: Complete logging consistency refactor - 100% migration to structured logging
- Migrated all 502 print() statements across 30 files to new structured logging/output system
- Separated user-facing output from diagnostic logging for better maintainability:
  - Created OutputHandler class in src/solokit/core/output.py for user-facing messages (stdout/stderr)
  - Enhanced logging_config.py with structured logging, JSON formatting, and context management
- Migrated 21 additional files across 4 batches in Session 29:
  - Batch 1 (100 statements): reporter.py, dependency_graph.py, tree.py
  - Batch 2 (37 statements): config_validator.py, cli.py, error_formatter.py, stack.py
  - Batch 3 (24 statements): milestones.py, curator.py, repository.py, work_items stragglers
  - Batch 4 (38 statements): env_validator.py, executor.py, performance.py, exceptions.py, and 4 others
- Fixed all migration issues:
  - Corrected indentation errors and incomplete f-strings from automated migration
  - Fixed variable shadowing bug in dependency_graph.py (output vs graph_output)
  - Added missing output = get_output() initialization in 8+ modules
  - Updated 45 tests to work with new output system instead of capturing stdout
- All 2,180 tests passing (100% pass rate) after migration
- Passed all quality gates: ruff linting, mypy type checking, code formatting
- Benefits: Cleaner separation of concerns, consistent user experience, better diagnostic logging, structured log output support
Refactor: Decompose manager.py god-class into modular architecture
- Decomposed monolithic 1,212-line WorkItemManager god-class into 8 focused, single-responsibility modules
- Created 7 new specialized modules: repository.py, creator.py, validator.py, query.py, updater.py, scheduler.py, milestones.py
- Refactored main manager.py from 1,212 to 260 lines (-79% reduction) by delegating to specialized modules
- Implemented dependency injection pattern with clear module responsibilities:
  - WorkItemRepository: Data access and persistence layer (CRUD operations) (235 lines)
  - WorkItemCreator: Interactive and non-interactive work item creation with prompts (436 lines)
  - WorkItemValidator: Validation logic for integration tests and deployments (197 lines)
  - WorkItemQuery: Listing, filtering, searching, sorting, and display (389 lines)
  - WorkItemUpdater: Update operations with field validation (211 lines)
  - WorkItemScheduler: Work queue management and next item selection (176 lines)
  - MilestoneManager: Milestone CRUD operations and progress tracking (133 lines)
- Created comprehensive test suite: 168 new unit tests for all new modules (213 tests total, up from 111)
- Added 4 new test files: test_repository.py, test_creator.py, test_query.py, test_milestones.py
- Updated test_manager.py to focus on integration testing of the orchestration layer (45 integration tests)
- Fixed 4 mypy type annotation errors in repository.py for strict type checking compliance
- All 2,165 tests passing (100% pass rate) including 213 work_items module tests
- Maintained full backward compatibility with existing WorkItemManager public API
- Benefits: Single responsibility principle, improved testability, better code navigation, extensibility, loose coupling, easier maintenance
Refactor: Decompose learning curator god-class into modular architecture
- Decomposed monolithic 1,226-line LearningsCurator god-class into 8 focused, single-responsibility modules
- Created 6 new specialized modules: categorizer.py, archiver.py, extractor.py, repository.py, reporter.py, validator.py
- Refactored main curator.py from 1,226 to 369 lines (-70% reduction) by delegating to specialized modules
- Implemented dependency injection pattern with clear module responsibilities:
  - LearningCategorizer: Auto-categorization with keyword scoring (124 lines)
  - LearningArchiver: Archive management for old learnings (116 lines)
  - LearningExtractor: Extract from sessions, git commits, code comments (343 lines)
  - LearningRepository: CRUD operations and data persistence (247 lines)
  - LearningReporter: Reports, statistics, search, timeline (349 lines)
  - LearningValidator: Validation logic and JSON schema (142 lines)
- Added 13 compatibility wrapper methods to maintain backward compatibility with existing tests
- Fixed FileOperationError exception handling in extractor for graceful JSON parsing failures
- All 2,143 tests passing (100% pass rate) including 212 learning-related tests
- Fixed all quality issues: ruff formatting (4 files), mypy type checking (2 errors)
- Benefits: Single responsibility principle, improved testability, better code navigation, extensibility, loose coupling
Refactor: Complete Quality Gates modularization into specialized checker architecture
- Decomposed monolithic 1,370-line gates.py god class into 10 focused, single-responsibility checker classes
- Created modular checker architecture with abstract QualityChecker base class and CheckResult dataclass
- Implemented 10 specialized checkers: SecurityChecker, ExecutionChecker, LintingChecker, FormattingChecker, DocumentationChecker, SpecCompletenessChecker, CustomValidationChecker, Context7Checker, IntegrationChecker, DeploymentChecker
- Refactored main gates.py from 1,370 to 611 lines (-55%) by delegating to specialized checkers
- Removed legacy gates_legacy.py (1,370 lines) after successfully migrating all functionality
- Created reporter infrastructure: ConsoleReporter and JSONReporter for flexible output formatting
- Added ResultAggregator for combining and analyzing checker results
- Implemented dependency injection pattern with optional CommandRunner parameter for fast, isolated testing
- Created comprehensive test suite: 220 new unit tests for all checker modules (360 tests total, up from 140)
- Achieved 95%+ code coverage across all new modules (100% on 4 checkers, 94-99% on others)
- Fixed all quality issues: ruff linting (91 errors), black formatting (28 files), mypy type checking (27 errors)
- Renamed TestRunner to ExecutionChecker to avoid pytest collection warnings
- Added configuration dataclasses: Context7Config, IntegrationConfig, DeploymentConfig
- All 360 tests passing (100% pass rate) with 0.40s execution time
- Maintained full backward compatibility with existing QualityGates interface
- Benefits: Single responsibility principle, easy to test, pluggable architecture, clear separation of concerns, type-safe, highly maintainable
Refactor: Extract learning similarity engine into dedicated module
- Created new src/solokit/learning/similarity.py module with reusable similarity detection algorithms
- Implemented JaccardContainmentSimilarity class with configurable thresholds and stopword filtering
- Implemented LearningSimilarityEngine with caching, pluggable algorithms, and Protocol-based design
- Added comprehensive test suite (35 tests) covering similarity algorithms, caching, merging, and edge cases
- Refactored LearningsCurator to delegate similarity operations to the new engine
- Removed duplicate similarity logic from curator (simplified 4 methods, removed 1 internal method)
- Fixed all ruff linting issues (14 deprecated typing imports converted to modern syntax)
- Achieved 100% mypy type checking compliance with proper type annotations
- All 1783 tests passing with no regressions
- Benefits: Better separation of concerns, improved testability, reusable similarity algorithms
Refactor: Add comprehensive type hints across entire codebase
- Added complete type hint coverage to all 55 source files in the codebase (100% coverage)
- Fixed 348 mypy errors across 6 refactoring sessions, achieving 0 type checking errors
- Modernized type annotations: converted Optional[X] to X | None syntax (14 occurrences)
- Added from __future__ import annotations to 12 modules for forward reference support
- Fixed Priority enum comparison methods to accept object parameter for protocol compatibility
- Fixed ErrorContext.exit() return type to Literal[False] for strict context manager protocol
- Added explicit return type annotations to 100+ functions including nested functions
- Added type annotations for complex variables: dict[str, Any], list[dict[str, str]], etc.
- Used # type: ignore[no-any-return] for unavoidable Any returns from json.load() and yaml.safe_load()
- Applied ruff auto-formatting to 8 files for consistent code style
- All 1520 unit tests passing with no regressions
- Benefits: IDE autocomplete, early error detection, better refactoring safety, improved documentation

Added

Core Error Handling Infrastructure
- Implemented comprehensive SDDError exception hierarchy with 50+ specialized exception types
- Added ErrorCode enumeration with 40+ error codes for standardized error identification
- Added ErrorCategory system (SYSTEM, USER, VALIDATION, NETWORK) for error classification
- Implemented ErrorFormatter for consistent error display with exit code mapping
- Added error handling decorators (@log_errors, @convert_subprocess_errors, @convert_file_errors)
- Created structured logging integration with context preservation and exception chaining
- All exceptions include context dict, remediation guidance, and proper exit codes

Changed

Standardized Error Handling Migration (Phases 1-3)
- Migrated 33 production files from print() and return tuples to structured exception-based error handling
- Phase 1 (11 files): Core utilities and briefing components
- Phase 2 (8 files): Work item management and validation
- Phase 3A (5 files): Core business logic (git/integration, quality/gates, learning/curator, session/complete, work_items/manager)
- Phase 3B (3 files): Testing infrastructure
- Phase 3C (6 files): Project management and configuration
- Replaced 200+ print() error statements with proper exception raising
- Replaced 26 return tuple patterns with exception-based error handling
- Replaced 8 sys.exit() calls in business logic with exceptions (CLI entry points preserved)
- Replaced 75+ broad Exception catches with specific exception types or catch-and-reraise pattern
- Added @log_errors() decorators to 40+ key functions for structured logging
- Updated 9 test files with pytest.raises() patterns and exception validation
- Quality gates intentionally kept 47 return tuples for result aggregation (not errors)
- All 1750 tests passing (100% coverage maintained)

Fixed

Linting and Formatting
- Fixed 77 type annotation warnings (Optional[X] → X | None) using ruff --unsafe-fixes
- Added missing ValidationError import in session/briefing.py
- Formatted 31 files with ruff format for consistent code style
- All ruff checks passing with zero errors

Investigated

Dataclass Migration Analysis
- Investigated replacing dictionary-based data structures with Python dataclasses across the codebase
- Analysis identified 1,260 dictionary patterns across 57 files requiring migration
- Estimated effort: 30-35 hours with high risk of introducing bugs
- Decision: Deferred indefinitely - current dict-based approach is stable and well-tested
- Rationale: Low ROI for a working CLI tool, prefer TypedDict for gradual type improvements
- All 1,471 tests passing (1,333 unit + 138 integration)

Changed

Refactor: Consolidate subprocess execution with CommandRunner
- Replaced all direct subprocess.run() calls with centralized CommandRunner class
- Updated 10 production files to use CommandRunner for consistent command execution:
  - visualization/dependency_graph.py - Graphviz SVG generation
  - session/validate.py - Git status validation
  - session/status.py - Git diff operations
  - session/complete.py - Stack/tree updates and git operations
  - learning/curator.py - Git log extraction
  - testing/performance.py - wrk load testing and docker operations
  - testing/integration_runner.py - Docker-compose and test execution
  - project/tree.py - Tree command execution
  - project/stack.py - Language version detection
  - project/init.py - Git init and dependency installation
- Updated 9 test files with proper CommandRunner mocking patterns using CommandResult objects
- Benefits: consistent error handling, timeout management, retry logic, and centralized logging
- Fixed pytest collection warning by renaming TestExecutionConfig to ExecutionConfig
- All 1,563 tests passing with zero warnings
Refactor: Decompose briefing.py god-class into modular package
- Decomposed monolithic 1,166-line session/briefing.py into focused package structure with 9 modules
- Created session/briefing/ package with single-responsibility modules averaging ~150 lines each:
  - orchestrator.py - SessionBriefing class for coordinating components
  - work_item_loader.py - WorkItemLoader for loading and resolving work items
  - learning_loader.py - LearningLoader for loading and scoring relevant learnings
  - documentation_loader.py - DocumentationLoader for project docs discovery
  - stack_detector.py - StackDetector for technology stack detection
  - tree_generator.py - TreeGenerator for directory tree loading
  - git_context.py - GitContext for git status and branch operations
  - milestone_builder.py - MilestoneBuilder for milestone context
  - formatter.py - BriefingFormatter for text formatting and generation
- 100% backward compatibility maintained through wrapper functions in __init__.py
- Added GitStatus.PR_CLOSED and GitStatus.DELETED enum values for complete git workflow states
- Class-based API enables better testability, reusability, and dependency injection
- All 1,440 unit and integration tests passing with no regressions
- Created comprehensive migration guide in docs/development/BRIEFING_REFACTOR_MIGRATION_GUIDE.md
- Benefits: improved maintainability, testability, code organization, and extensibility
Refactor: Replace magic strings with type-safe enums
- Created comprehensive enum system in core/types.py with 4 enums: WorkItemType, WorkItemStatus, Priority, GitStatus
- Updated 12 modules to use type-safe enums instead of magic strings
- Priority enum supports comparison operations (<, >, <=, >=) for prioritization logic
- GitStatus enum updated to match actual workflow states (in_progress, ready_to_merge, ready_for_pr, pr_created, merged)
- All enums inherit from str for seamless JSON serialization compatibility
- Each enum provides .values() class method for validation and iteration
- 100% backward compatibility maintained - no changes to JSON data formats
- All 1,532 tests passing with no regressions
- Created comprehensive documentation in docs/development/ENUM_USAGE_GUIDE.md with usage patterns, examples, and migration guide
- Benefits: IDE autocomplete, type safety, easier refactoring, single source of truth for valid values
Refactor: Centralized configuration management with ConfigManager
- Created core/config.py with singleton ConfigManager for centralized config loading
- Type-safe dataclasses for all config sections (QualityGatesConfig, CurationConfig, GitConfig)
- Caching mechanism to avoid redundant file reads with invalidation support
- Refactored 5 modules to use ConfigManager: quality/gates.py, git/integration.py, learning/curator.py, session/complete.py, session/validate.py
- 21 comprehensive unit tests for ConfigManager with 98% coverage
- Fixed 8 previously skipped tests in test suite
- Removed 3 obsolete test classes (duplicate config loading tests)
- All 1256 unit tests pass (up from 1248) with 0 skipped tests
- Net reduction of 183 lines of code through deduplication
- Eliminated duplicate config loading logic across modules
Refactor: Consolidated JSON file I/O operations
- Centralized all JSON file operations in core/file_ops.py with JSONFileOperations class
- Added FileOperationError exception for consistent error handling
- Enhanced features: atomic writes by default, optional validation hooks, automatic directory creation
- New load_json_safe() method for guaranteed return (never raises)
- Removed duplicate _load_json and _save_json methods from learning/curator.py
- 97% test coverage with 41 comprehensive unit tests
- All 1240 unit tests pass with no regressions
- Eliminated ~100+ lines of duplicate code across codebase
- Created comprehensive API reference documentation in docs/reference/file-operations-api.md
- Updated architecture documentation

[0.1.0] - 2025-10-26

Note: Versions 0.6.0 and 0.7.0 were development versions that have been consolidated into the 0.1.x public release series.

Added

Enhanced session briefings with context continuity
- Previous Work section for in-progress items showing commits, file stats, and quality gates from prior sessions
- Enriched session summaries with full commit messages and file change statistics
- Enhanced learning relevance scoring using multi-factor algorithm (keywords, type, recency, category bonuses)
- Top 10 relevant learnings (up from 5) with intelligent scoring
- Fixes briefing update bug - briefings now regenerated for in-progress items
- Fixes timing issue - work_items data reloaded after recording commits to ensure accurate summaries
- Makes multi-session work practical by eliminating context loss
- 22 new comprehensive unit tests for helper functions and enhanced functionality
- Updated documentation in .claude/commands/start.md and .claude/commands/end.md
Work item deletion - Safe deletion of work items with dependency checking
- New sk work-delete <work_item_id> command
- Interactive mode with 3 options: keep spec, delete spec, or cancel
- Non-interactive mode with --keep-spec and --delete-spec flags
- Dependency checking warns about dependent work items
- Automatic metadata updates (total_items, status counts)
- 19 comprehensive unit tests
- Full documentation in .claude/commands/work-delete.md and docs/commands/work-delete.md
Work item completion status control - Explicit control over work item completion during session end
- Interactive 3-choice prompt: "Mark completed", "Keep in-progress", "Cancel"
- Command-line flags: --complete and --incomplete
- Supports multi-session workflows
- 8 unit tests added
PyPI Publishing Workflow - Automated package publishing to PyPI on GitHub releases
Comprehensive test infrastructure - Test suite reorganization and expansion
- 1,408 comprehensive tests (up from 183, 765% increase)
- 85% code coverage (up from 30%)
- Unit/integration/e2e structure across 35 test files
- 4 modules at 100% coverage, 20 modules at 75%+ coverage
Auto git initialization - sk init now automatically initializes git repository and creates initial commit
Pre-flight commit check - sk end validates all changes are committed before running quality gates
CHANGELOG workflow improvements - Git hooks with reminders + smarter branch-level detection
OS-specific .gitignore patterns - macOS, Windows, and Linux patterns automatically added during sk init

Changed

BREAKING: Package structure migrated to standard Python src/ layout
- Moved all Python modules from flat directory to organized src/solokit/ package structure
- Created domain-organized subdirectories: core/, session/, work_items/, learning/, quality/, visualization/, git/, testing/, deployment/, project/
- Updated all imports from scripts.X to solokit.X pattern (43 files)
- Removed all sys.path.insert() hacks (38 instances)
- Removed setup.py in favor of PEP 517/518 pyproject.toml-only configuration
- CLI command remains solokit (no user-facing changes)
- All tests pass, PyPI-ready structure, better IDE support
Simplified git branch naming - Branch names now use work item ID directly
- Format: feature_oauth instead of session-001-feature_oauth
- Clearer intent, shorter names, backward compatible
Standardized spec validation - All work item types now use "Acceptance Criteria" section consistently
- Updated refactor specs to use "Acceptance Criteria" (was "Success Criteria")
Makefile clean target - Now removes coverage artifacts (htmlcov/, coverage.xml, coverage.json)

Fixed

Quality gates test timeout - Increased from 5 to 10 minutes (1408 tests take ~6 minutes)
Docstring validation - Fixed pydocstyle configuration to properly validate project docstrings
Bug #25: Git branch status now finalizes when switching work items (12 unit tests)
Bug #24: /start command now properly handles explicit work item selection (3 unit tests)
Bug #23: Bug/refactor spec templates now include "Acceptance Criteria" section
Bug #21: Learning curator no longer extracts test data strings (21 unit tests)
Bug #20: Multi-line LEARNING statements now captured completely (30 unit tests)
UX improvements: Auto git init, pre-flight checks, CHANGELOG reminders, clear error messages

Removed

Deleted obsolete development tracking files (NEXT_SESSION_PROMPT.md, TEST_PROGRESS.md)
Removed 38 instances of sys.path.insert() manipulation
Removed flat directory structure
Removed E402 ignore from ruff config

[0.5.8] - 2025-10-21

Added

Marketplace Plugin Support: Solokit now works as a Claude Code marketplace plugin
One-time setup command for plugin users: pip install -e ~/.claude/plugins/marketplaces/claude-plugins/solokit
Simplified installation documentation with clear paths for both marketplace and direct installation

Changed

Unified CLI: All 15 slash command files now use solokit command instead of relative paths
Updated command files: init.md, start.md, end.md, status.md, validate.md, learn*.md, work-*.md
Simplified README installation section with two clear options (marketplace vs. direct)
Updated all CLI examples throughout documentation to use solokit command
Updated marketplace README (claude-plugins/README.md) with v0.5.8 installation instructions
Updated Architecture Notes to reflect v0.5.8 changes

Technical Details

Files Modified: 18 files total
- 15 command files (.claude/commands/*.md)
- 1 main README (README.md)
- 1 marketplace README (in separate repo)
- 1 pyproject.toml (version bump)
Breaking Changes: Command files no longer use relative Python paths - now use solokit CLI
Migration: Users must run pip install -e . if not already done

Migration Guide

For marketplace plugin users:

pip install -e ~/.claude/plugins/marketplaces/claude-plugins/solokit

For existing direct installations:

cd /path/to/solokit
pip install -e .

All slash commands will now work via the solokit CLI.

Benefits

✅ Plugin works from marketplace installation
✅ No need to clone Solokit into every project
✅ Cleaner, more standard approach
✅ Works identically whether installed directly or via marketplace
✅ Aligns with Python package best practices

Reference

See ROADMAP.md Phase 5.8 for complete details.

[0.5.7] - 2025-10-18

Added

Spec-first architecture: .session/specs/*.md files are now the single source of truth for work item content
Comprehensive markdown parser (spec_parser.py, 700+ lines) supporting all 6 work item types
Spec file validation system with required section checks and quality gates
Complete context loading - removed all compression (50-line tree limit, 500-char doc limits)
Writing guide (docs/guides/writing-specs.md, 500+ lines) with examples for all work item types
Template structure documentation (docs/reference/spec-template-structure.md)

Changed

Eliminated dual storage problem - work item content now only in spec files, not work_items.json
Enhanced all 6 spec templates with comprehensive examples and inline guidance
Updated briefing system to load full spec content without truncation
Refactored validators and runners to use spec parser
Quality gates now validate spec completeness before session completion

Removed

Content fields from work_items.json (rationale, acceptance_criteria, implementation_paths, test_paths)
Compression limits on project documentation
Duplicate briefing sections

Technical Details

Tests Added: 49 tests across 6 test files
Code Added: ~3,200 lines (spec_parser.py, spec_validator.py, templates, docs)
Files Created: 8 new files (validator, docs, test files)
Files Enhanced: 12 files (briefing_generator, quality_gates, templates, commands)

Reference

See ROADMAP.md Phase 5.7 for complete details.

[0.5.6] - 2025-10-15

Added

Deployment work item type with comprehensive validation framework
Deployment execution framework with pre-deployment validation and rollback automation
Environment validation system with 7 validation types (connectivity, configuration, dependencies, health checks, monitoring, infrastructure, capacity)
Deployment quality gates integrated with quality_gates.py
Multi-environment support (staging vs production with different configurations)
Automated smoke test execution with timeout and retry support
Dry-run mode for deployment simulation

Changed

Enhanced deployment_spec.md template with 11 sections including deployment procedure, rollback, smoke tests
Session workflow now includes deployment-specific briefings and summaries
Quality gates include deployment validation before execution

Technical Details

Tests Added: 65 tests across 5 test files
Code Added: ~2,049 lines (deployment_executor.py, environment_validator.py, enhanced templates)
Validation Types: 7 comprehensive environment checks
Focus: Production deployment safety and automation

Reference

See ROADMAP.md Phase 5.6 for complete details.

[0.5.5] - 2025-10-15

Added

Integration testing framework with comprehensive validation
Enhanced integration test work item type with multi-component dependency tracking
Integration test execution framework with Docker Compose orchestration
Performance benchmarking system with regression detection (10% threshold)
API contract validation with breaking change detection
Integration quality gates with environment validation
Integration documentation requirements (architecture diagrams, sequence diagrams, API contracts)

Changed

Enhanced integration_test_spec.md template with test scenarios, performance benchmarks
Session workflow includes integration-specific briefings and summaries
Quality gates validate integration test environment before execution

Technical Details

Tests Added: 178 tests across 7 test files
Code Added: ~5,458 lines (integration_test_runner.py, performance_benchmark.py, api_contract_validator.py)
Performance Tracking: Latency percentiles (p50, p75, p90, p95, p99), throughput, response time
Focus: Multi-service integration validation and performance regression detection

Reference

See ROADMAP.md Phase 5.5 for complete details.

[0.5] - 2025-10-14

Added

Quality gates system for automated quality enforcement at session completion
Test execution with coverage parsing and multi-language support (Python, JavaScript, TypeScript)
Security scanning integration (bandit, safety, npm audit) with severity-based filtering
Linting and formatting with auto-fix modes (ruff, eslint, prettier)
Documentation validation (CHANGELOG, docstrings, README)
Context7 MCP integration (stub ready for production)
Custom validation rules (per-work-item and project-level)
Quality gate reporting with remediation guidance

Changed

Session completion now enforces quality standards before allowing completion
Extracted quality gate logic into dedicated quality_gates.py module (770 lines)
Added quality gates configuration to .session/config.json during /init

Fixed

pytest exit code 5 ("no tests collected") now treated as skipped, not failed
Auto-fix modes for linting and formatting improve developer experience

Technical Details

Tests Added: 54 tests across all quality gate types
Code Added: 875 lines (quality_gates.py, config integration)
Tools Supported: pytest, ruff, bandit, safety, eslint, prettier, npm audit
Configuration: Required vs optional gate enforcement

Reference

See ROADMAP.md Phase 5 for complete details.

[0.4] - 2025-10-14

Added

Learning capture and curation system for knowledge management
4 learning commands: /learn, /learn-show, /learn-search, /learn-curate
Auto-categorization into 6 categories (architecture_patterns, gotchas, best_practices, technical_debt, performance_insights, security)
Similarity detection using Jaccard (0.6) and containment (0.8) thresholds
Automatic duplicate detection and merging
Multi-source learning extraction (session summaries, git commits with LEARNING:, inline # LEARNING: comments)
Enhanced browsing with filters (category, tags, date range, session number)
Statistics dashboard and timeline view
Auto-curation trigger every N sessions (default 5, configurable)

Changed

Sessions now include automated learning capture at completion
.session/config.json includes learning configuration (auto_curate_frequency, similarity_threshold)

Technical Details

Tests Added: 53 tests across all learning features
Code Added: ~1,587 lines (commands, documentation, integration)
Documentation: docs/reference/learning-system.md guide (550 lines)
Categories: 6 comprehensive categories covering software development learnings

Reference

See ROADMAP.md Phase 4 for complete details.

[0.3] - 2025-10-13

Added

Work item dependency graph visualization with critical path analysis
/work-graph command with multiple output formats (ASCII, DOT, SVG)
Graph filtering options (status, milestone, type, focus node, include-completed)
Critical path analysis with automatic highlighting in all formats
Bottleneck detection (items blocking 2+ others)
Graph statistics (total items, completion percentage, critical path length)
Neighborhood view with --focus for exploring specific work items

Changed

Enhanced dependency_graph.py with 313 new lines for CLI integration
Graph visualization updates automatically when work items change

Technical Details

Tests Added: 36 tests across 6 sections
Code Added: 426 lines (command integration, enhanced graph features)
Formats: ASCII (terminal-friendly), DOT (Graphviz), SVG (documentation)
Focus: Understanding project structure and identifying bottlenecks

Reference

See ROADMAP.md Phase 3 for complete details.

[0.2] - 2025-10-13

Added

Work item management system with full CRUD operations
6 work item types (feature, bug, refactor, security, integration_test, deployment)
5 work item commands: /work-new, /work-list, /work-show, /work-update, /work-next
Dependency tracking and resolution
Priority levels (critical, high, medium, low) with visual indicators (🔴🟠🟡🟢)
Milestone organization and progress tracking
Status tracking (backlog, in_progress, completed, blocked)
Conversational interface for work item creation (Claude Code compatible)

Changed

Sessions now include comprehensive work item tracking
Briefings include milestone context and dependency status
/status command shows work item context and progress

Technical Details

Tests Added: 9 tests for work item management
Code Added: work_item_manager.py (500+ lines)
CLI Commands: Non-interactive mode for Claude Code compatibility
Storage: JSON-based work item tracking in .session/work_items.json

Reference

See ROADMAP.md Phase 2 for complete details.

[0.1] - 2025-10-13

Added

Core session management framework with complete workflow
/init command for project initialization
Stack tracking system (generate_stack.py) with technology detection
Tree tracking system (generate_tree.py) with structure change detection
Git workflow integration (git_integration.py) with branch management
Enhanced /start with comprehensive context loading (docs, stack, tree, git)
Enhanced /end with tracking updates and quality gates
/validate command for pre-flight checks before session completion
Multi-session work item support (resume on same branch)

Changed

Session initialization creates .session/ directory structure
Briefings include full project context (vision, architecture, stack, tree)
Session completion updates all tracking files automatically

Technical Details

Tests Added: 6 core tests
Code Added: 2,174 lines across 12 scripts
Infrastructure: .session/ directory with tracking files
Git Integration: Automatic branch creation, commit, push, merge

Reference

See ROADMAP.md Phase 1 for complete details.

[0.0] - 2025-10-10

Added

Foundation and documentation for Session-Driven Development methodology
Repository structure with .claude/commands/ directory (16 slash commands)
Basic briefing generation (briefing_generator.py)
Basic session completion (session_complete.py)
Learning curation system (learning_curator.py) - complete and production-ready
Dependency graph visualization (dependency_graph.py) - complete and production-ready
File operation utilities (file_ops.py)
Comprehensive methodology documentation (docs/solokit-methodology.md)
Implementation insights documentation (docs/implementation-insights.md)
AI-augmented framework reference (docs/ai-augmented-solo-framework.md)

Technical Details

Work Item Schema: Defined in templates/work_items.json
Learning Schema: Defined in templates/learnings.json
Algorithms: Dependency resolution (DFS-based), Learning categorization (keyword-based), Similarity detection (Jaccard + containment)

Reference

See ROADMAP.md Phase 0 for complete details.

Version Numbering

Versions follow semantic versioning (MAJOR.MINOR.PATCH):

MAJOR: Incompatible API changes
MINOR: New functionality (backward compatible)
PATCH: Bug fixes (backward compatible)

Phase mapping to public release versions:

Phases 0-5.9 (Development phases) → v0.1.0 (Initial Public Release)
- Phase 0: Foundation & documentation
- Phase 1: Core session workflow
- Phase 2: Work item system
- Phase 3: Dependency graphs
- Phase 4: Learning management
- Phase 5: Quality gates
- Phase 5.5: Integration testing
- Phase 5.6: Deployment support
- Phase 5.7: Spec-first architecture
- Phase 5.8: Marketplace plugin support
- Phase 5.9: Standard Python src/ layout & PyPI publishing
v0.3.0 = Current release ✅ Current (Minimal init mode, bug fixes)
v0.2.2 = Previous release (Fix missing test_execution.commands config)
v0.2.1 = Earlier release (Critical CVE patches for Next.js/React templates)
v0.2.0 = Earlier release (Tailwind CSS v4 migration, CHANGELOG check fixes)
v0.1.7 = Earlier release (Improved /end command flow, slash command format)
v0.1.6 = Earlier release (Minimal scaffolding migration complete)
v0.1.5 = Earlier release (GitHub setup integration, security fixes)
v0.1.4 = Earlier release (Test coverage improvements)
v0.1.3 = Earlier release (Documentation model improvements)
v0.1.1 = Earlier release (UX improvements & bug fixes)
v1.0.0 = Stable API release (planned)