Changelog
January 15, 2026 · View on GitHub
All notable changes to the Solokit (Session-Driven Development) project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]
Added
- Meaningful Test Guidance in Documentation
- Added explicit guidance that tests should represent real use cases, not just satisfy coverage metrics
- Updated PRD_WRITING_GUIDE.md with "Write Meaningful Tests, Not Just Coverage" section
- Updated writing-specs.md with new tip on scenario-driven vs coverage-driven tests
- Key principle: "If this feature broke in production, would this test catch it?"
- Helps prevent false confidence from tests that pass but don't catch real bugs
[0.3.0] - 2026-01-14
Added
- Minimal Init Mode (
sk init --minimal) (PR #207)- New
--minimalflag for lightweight project initialization - Installs only session tracking infrastructure without templates or quality tiers
- Ideal for simple projects (HTML sites, scripts, prototypes) that don't need testing/linting
- Creates:
.session/structure, guides, Claude Code slash commands, CLAUDE.md, README.md, CHANGELOG.md - Quality gates disabled by default in minimal mode config
- Git repository and GitHub setup still available
- Comprehensive test coverage: 28 unit tests, 11 integration tests
- New
Fixed
-
/endCommand Fails on New Projects with Few Commits- Changed
git log --oneline HEAD~10..HEADtogit log --oneline -10in end.md command - The previous syntax fails when a repository has fewer than 10 commits
- Affects all solokit-initialized projects when running
/endearly in development
- Changed
-
Semgrep CI Installation Failure in ml_ai_fastapi Template
- Replaced unpinned
pip install semgrepwith officialreturntocorp/semgrep-action@v1GitHub Action - Fixes CI security workflow failures caused by semgrep dependency resolution cascade
- The official action handles installation complexity and provides native GitHub integration
- Affects projects using ml_ai_fastapi template with CI/CD option enabled
- Replaced unpinned
[0.2.2] - 2025-12-05
Fixed
-
Missing test_execution.commands in Generated Config (PR #193)
- Added
commandsconfiguration totest_executionin session_structure.py - Previously,
sk validatewould show✗ tests: Nonebecause no test command was configured - Now includes default commands for Python (
pytest), JavaScript, and TypeScript (npm test) - Affects all projects initialized with
sk initorsk adopt
- Added
-
Validation Message Showing "None" Instead of Error
- Fixed display bug where validation used
dict.get()incorrectly - Now properly shows "Tests fail" message instead of "None"
- Fixed display bug where validation used
Changed
- Updated config.schema.json with
commandsproperty for test_execution - Updated test fixtures and documentation with complete configuration examples
[0.2.1] - 2025-12-04
Security
- Critical CVE Patches for Next.js and React Templates (PR #191)
- Updated Next.js from 16.0.1 to 16.0.7 to address CVE-2025-66478 (CVSS 10.0)
- Updated React from 19.2.0 to 19.2.1 to address CVE-2025-55182
- Updated react-dom from 19.2.0 to 19.2.1
- Updated eslint-config-next from 16.0.1 to 16.0.7
- Updated @next/bundle-analyzer from 16.0.1 to 16.0.7
- Affects all Next.js templates: fullstack_nextjs, saas_t3, dashboard_refine (all tiers)
- Reference: https://nextjs.org/blog/CVE-2025-66478
[0.2.0] - 2025-12-03
Fixed
-
Tailwind CSS v4 Migration for All Templates (fullstack_nextjs, saas_t3, dashboard_refine)
- Migrated from Tailwind v3
@tailwinddirectives to v4@import "tailwindcss"syntax - Moved theme configuration from JavaScript
tailwind.config.tsto CSS@themeblocks - Updated components to use theme tokens (
bg-background,text-foreground) instead of hardcoded colors - Fixed accessibility violation: Changed
text-blue-500totext-blue-700for WCAG AA contrast compliance - Updated all test files to match new theme-based styling expectations
- Added comprehensive Tailwind v4 documentation to ARCHITECTURE.md files
- Migrated from Tailwind v3
-
CHANGELOG Update Check False Positives in /end Command
- Changed check from
HEAD~10(last 10 commits) tomain...HEAD(commits since branch creation) - Previous logic gave false positives when a prior session had updated CHANGELOG
- Now correctly detects whether CHANGELOG was updated in the current session/branch
- Changed check from
[0.1.7] - 2025-11-28
Changed
-
Improved /end Command Flow (PR #184)
- Streamlined
/endcommand with clear 5-step process: pre-flight checks, completion status, runsk end, create PR, show results sk endno longer attempts commits - Claude handles git commits,sk endjust verifies and pushes- Better error messages: Clear guidance when no commits found on branch
- Removed confusing two-option learning approach (commit tags vs temp file)
- Streamlined
-
Slash Command Format for User-Facing Suggestions (PR #186)
- Updated all user-facing command suggestions to use slash command format (
/start,/end,/validate) instead of CLI format (sk start,sk end) - Updated 30 files including CLAUDE.md templates, Python CLI output, and guides
- Principle: User-facing suggestions use
/slash format, execution examples useskCLI format
- Updated all user-facing command suggestions to use slash command format (
Fixed
-
CHANGELOG.md Not Being Copied During
sk init(PR #184)- CHANGELOG.md template now properly copied to project root during initialization
- Won't overwrite existing CHANGELOG.md files
-
Git Commit Double-Wrapped Error Messages (PR #184)
- Fixed "Commit failed: Commit failed:" double-wrapped error messages
-
CI Template Permissions and Smoke Test Endpoint (PR #185)
- Added
pull-requests: readpermission to secrets-scan job in security.yml (required for gitleaks-action) - Updated smoke-test in test.yml to test
/instead of/api/healthfor Node.js templates - Smoke tests no longer require database connectivity
- Added
Added
- PRD and STACK Guide References in Templates (PR #183)
- Added "Writing PRDs" subsection under Claude Behavior Guidelines in all CLAUDE.md templates
- Updated "Reference Documentation" section with mandatory PRD guide reference
- Updated "Key Files" table to include PRD_WRITING_GUIDE.md and STACK_GUIDE.md
[0.1.6] - 2025-11-27
Changed
-
Minimal Scaffolding Migration: Phase 4 Documentation & Cleanup
- Updated main README.md:
- Changed "production-ready templates" messaging to "minimal scaffolding templates"
- Added PRD-driven development workflow
- Updated template descriptions to emphasize documentation over example code
- Updated test counts to 3,802 tests
- Added Development Guides section for
.session/guides/
- Updated template-registry.json:
- All template descriptions now say "Minimal scaffolding for..."
- Category descriptions updated to reflect minimal scaffolding approach
- Documentation consistency fixes across all 4 stacks:
- Added "These are files you will CREATE" clarifications to ARCHITECTURE.md examples
- Added "target structure you will build" notes to STACK_GUIDE.md file organization sections
- Fixed path inconsistencies in CLAUDE.md.template files
- Updated test counts in docs/project/ROADMAP.md and tests/e2e/README.md
- All 3,802 tests pass
- Updated main README.md:
-
Minimal Scaffolding Migration: dashboard_refine (Phase 3.4)
- Removed example code (~977 lines) from dashboard_refine template:
- Removed
app/(dashboard)/users/directory (95 + 122 lines) - Example users list page and tests - Removed
components/forms/user-form.tsxand tests (76 + 206 lines) - Example form component - Removed
lib/validations.tsand tests (52 + 225 lines) - Example validation schemas - Removed
tier-1-essential/tests/unit/example.test.tsx(77 lines) - Example tests - Removed
tier-3-comprehensive/tests/e2e/user-management.spec.ts(124 lines) - User E2E tests - Removed
tier-3-comprehensive/tests/integration/dashboard.test.ts(28 lines) - Example integration test
- Removed
- Updated
lib/refine.tsx:- Replaced mock data provider with placeholder that throws helpful errors
- Cleared example resources array
- Added comprehensive documentation for data provider setup
- Updated
components/layout/sidebar.tsx: Simplified to Dashboard route only - Updated
app/(dashboard)/page.tsx: Minimal welcome page with guidance cards - Updated
app/(dashboard)/__tests__/page.test.tsx: Tests for minimal page - Updated
lib/__tests__/refine.test.tsx: Tests for placeholder data provider - Updated
components/layout/__tests__/sidebar.test.tsx: Tests for minimal sidebar - Updated
tier-3-comprehensive/tests/e2e/dashboard.spec.ts: Tests for minimal dashboard - Updated
providers/__tests__/refine-provider.test.tsx: Fixed mock to use empty resources array - Created
tier-3-comprehensive/tests/integration/api.test.ts: Placeholder integration test - Updated
ARCHITECTURE.md:- Added "Building From Scratch" section with complete Refine patterns
- Updated "Decision 2" to explain data provider requirement
- Added comprehensive "Data Provider Options" section with REST, GraphQL, Supabase, Custom examples
- Updated project structure to reflect minimal scaffolding
- Updated
CLAUDE.md.template:- Added "Building From Scratch" section with quick pattern reference
- Updated file organization table with correct paths
- Updated data provider warning to reflect placeholder behavior
- Cleaned up empty directories (tier-1-essential/tests/unit/, tier-3-comprehensive/tests/integration/)
- Removed example code (~977 lines) from dashboard_refine template:
-
Documentation Consistency Fixes (All 4 stacks)
- saas_t3 ARCHITECTURE.md: Moved "Building From Scratch" section from end to after Overview
- ml_ai_fastapi CLAUDE.md.template: Reordered sections (Building From Scratch before Stack Architecture Rules)
- dashboard_refine CLAUDE.md.template: Reordered sections (Building From Scratch before Stack Architecture Rules)
- dashboard_refine docker/README.md: Fixed "mock data provider" → "placeholder data provider" terminology
- fullstack_nextjs CLAUDE.md.template: Removed incorrect
src/prefix from file paths - fullstack_nextjs ARCHITECTURE.md: Fixed
lib/validations/[feature].ts→lib/validations.ts - saas_t3 CLAUDE.md.template: Removed incorrect
src/prefix from file paths
-
Minimal Scaffolding Migration: saas_t3 (Phase 3.3)
- Removed example code (~400 lines) from saas_t3 template:
- Removed
server/api/routers/example.ts(33 lines) - tRPC CRUD router - Removed
server/api/routers/__tests__/example.test.ts(220 lines) - Router tests - Removed
components/example-component.tsx(22 lines) - tRPC usage example - Removed
components/__tests__/example-component.test.tsx(117 lines) - Component tests - Removed
tier-1-essential/tests/unit/example.test.tsx(24 lines) - Example tests
- Removed
- Updated
server/api/root.ts: Removed example router, kept commented example - Updated
app/page.tsx: Minimal welcome page (no tRPC usage) - Updated
app/__tests__/page.test.tsx: Tests for minimal page - Updated
server/api/__tests__/root.test.ts: Removed example router references - Updated
prisma/schema.prisma: Removed User model, kept commented example - Updated
tier-3-comprehensive/tests/e2e/home.spec.ts: Tests for minimal page - Updated
ARCHITECTURE.md:- Added "Building From Scratch" section with complete tRPC patterns
- Added "Type Safety Flow" diagram showing Prisma → tRPC → React chain
- Updated project structure to reflect minimal scaffolding
- Updated
CLAUDE.md.template:- Added "Building From Scratch" section
- Updated code patterns to use generic
postsexample instead ofexample
- Added
.gitkeepto preserve emptycomponents/androuters/directories
- Removed example code (~400 lines) from saas_t3 template:
-
Minimal Scaffolding Migration: ml_ai_fastapi (Phase 3.2)
- Removed example code (~430 lines) from ml_ai_fastapi template:
- Removed
src/api/routes/example.py(135 lines) - Full CRUD router - Removed
src/models/example.py(61 lines) - SQLModel Item model - Removed
src/services/example.py(115 lines) - Service layer example - Removed
tests/unit/test_example.py(117 lines) - Example service tests
- Removed
- Updated
main.py.template: Removed example router import and include - Updated
models/__init__.py: Removed Item exports, added documentation example - Updated
core/database.py: Removed Item import, added documentation example - Updated
alembic/env.py: Removed Item import, added documentation example - Updated
tests/unit/test_api_routes.py: Removed TestItemRoutes, kept TestHealthRoutes - Updated
tests/integration/test_api.py: Removed TestItemAPIIntegration, kept TestHealthEndpoints - Updated
locustfile.py: Removed Item API tests, kept health check tests only - Updated
tier-4-production/src/core/monitoring.py: Removed example-specific counters - Updated
ARCHITECTURE.md:- Added "Building From Scratch" section with step-by-step guide
- Updated project structure to reflect minimal scaffolding
- Updated
CLAUDE.md.template:- Added "Building From Scratch" section
- Added "Quick Pattern Reference" for adding new features
- Removed example code (~430 lines) from ml_ai_fastapi template:
-
Minimal Scaffolding Migration: fullstack_nextjs (Phase 3.1)
- Removed example code (~1,206 lines) from fullstack_nextjs template:
- Removed
app/api/example/route and tests - Removed
components/example-component.tsxand tests - Removed
lib/validations.ts(example Zod schemas) and tests - Removed
Usermodel from Prisma schema (kept commented example)
- Removed
- Updated
page.tsxto minimal welcome page (no example imports) - Updated
ARCHITECTURE.md:- Added "Building From Scratch" section with step-by-step guide
- Removed references to example files in project structure
- Updated
CLAUDE.md.template:- Added "Building From Scratch" section
- Updated guidance to reference ARCHITECTURE.md instead of existing code
- Added health check test at
app/api/health/__tests__/route.test.ts - Updated E2E tests (
flow.spec.ts) for minimal page - Added
.gitkeepto preserve emptycomponents/directory
- Removed example code (~1,206 lines) from fullstack_nextjs template:
Added
-
Quality Gate Adjustments for Minimal Scaffolding (Phase 2 of Minimal Scaffolding Migration)
- New
scaffolding.pymodule with minimal scaffolding detection functions:is_minimal_scaffolding(): Detects if project has only health check codehas_integration_test_files(): Checks for integration test fileshas_e2e_test_files(): Checks for E2E test files (Playwright/Cypress)
- Integration tests now skip gracefully when no integration test files exist
- Integration tests handle missing spec files by skipping instead of failing
- Added 23 new tests for scaffolding module
- Added test for new integration checker behavior
- CI/CD workflows already use
--if-presentfor conditional E2E/integration tests - Coverage threshold only enforced when coverage report exists
- Tier configs use glob patterns that gracefully handle empty directories
- New
-
PRD Writing Guide and Stack Selection Guide
- New
STACK_GUIDE.md: Comprehensive guide for choosing between the 4 Solokit stacks- Quick decision tree for stack selection
- Detailed comparison matrix (type safety, API style, learning curve, etc.)
- Per-stack profiles with best-for/not-ideal-for guidance
- When-to-switch guidance
- New
PRD_WRITING_GUIDE.md: Complete PRD writing guide for Claude-driven development- Rewritten for AI-assisted workflow (Claude writes PRD and implements code)
- Vertical slices philosophy and INVEST principles
- Technical constraints section with stack selection
- Definition of Ready (DoR) checklist for Claude
- Mapping PRD stories to Solokit work items
- Claude-optimized PRD template
- Guides automatically copied to
.session/guides/duringsk init - Updated post-init message to reference guides and recommend PRD workflow
- New fixture
tracking_template_files_with_guidesfor testing - Added 6 new tests for guide functionality (3,773 → 3,779 total tests)
- New
-
Format/Lint Auto-Fix Before Initial Commit
- New Step 19 in
sk initworkflow: runs format and lint auto-fix before initial commit - Fixes user-provided files (PRD.md, ROADMAP.md, etc.) that may have formatting issues
- Node.js projects: runs
npm run format(Prettier) andnpm run lint:fix(ESLint) - Python projects: runs
ruff format .andruff check --fix . - Silent operation: only logs at debug level, non-blocking on failure
- Init flow updated from 20 to 21 steps (GitHub setup moved to Step 21)
- New module:
src/solokit/init/format_lint_fixer.py - Added 18 new tests for format_lint_fixer module (3,755 → 3,773 total tests)
- New Step 19 in
[0.1.5] - 2025-11-26
Added
-
GitHub Repository Setup Integration
- New
src/solokit/github/module for post-init GitHub repository setup - Interactive prompts to create new repo or connect to existing one
- Supports both
ghCLI and manual remote configuration - Added
check_git_installed()andcheck_gh_installed()to environment validator - Integrated as Step 20 in
sk initworkflow (after initial commit) - Added 45 new tests for GitHub setup module (3,710 → 3,755 total tests)
- New
-
Safe Config Implementation for
sk adopt- Added intelligent file categorization: NEVER_OVERWRITE, MERGE_IF_EXISTS, INSTALL_IF_MISSING
- New backup system: all modified files backed up to
.solokit-backup/<timestamp>/ - Smart merge strategies for 7 file types: package.json, pyproject.toml, eslint.config.mjs, .prettierrc, .pre-commit-config.yaml, requirements.txt, .husky/pre-commit
- Added
--dry-runflag to preview changes without modifications - Improved warning message showing categorized file handling
- Added 119 new tests for backup and merge modules (3,591 → 3,710 total tests)
- Test breakdown: 3,452 unit + 178 integration + 80 e2e tests
- New files:
adopt/backup.py,adopt/merge_strategies.py - All quality checks passing: ruff, mypy, formatting, 97% coverage
Security
-
Fixed Sentry Security Vulnerability (GHSA-6465-jgvq-jhgp)
- Upgraded
@sentry/nextjsfrom 10.23.0 to 10.27.0 in all Next.js templates - Vulnerability: Sensitive headers leaked when
sendDefaultPiiis set totrue - Affected versions: 10.11.0 - 10.26.0
- Updated:
stack-versions.yamland all tier-4package.json.tier4.templatefiles
- Upgraded
-
Fixed npm audit vulnerabilities
- Added
tmp: 0.2.5override to fix@lhci/clivulnerability - All Next.js templates now pass
npm auditwith 0 vulnerabilities
- Added
Fixed
-
TypeScript Type Conflicts in E2E Tests
- Fixed AxeBuilder type conflict between
@playwright/testand@axe-core/playwright - Applied
as anycast workaround in all tier-3 e2e test files - Affected:
saas_t3,dashboard_refine,fullstack_nextjstemplates
- Fixed AxeBuilder type conflict between
-
Prettier Formatting Issues in Templates
- Fixed ARCHITECTURE.md and CLAUDE.md.template formatting across all stacks
- Fixed blank line handling in
readme_generator.pyandclaude_md_generator.py - All templates now pass
prettier --check
-
E2E Test Python Executable
- Fixed
test_core_session_workflow.pyto usesys.executableinstead of hardcodedpython3
- Fixed
[0.1.4] - 2025-11-24
Added
-
Comprehensive Test Coverage Improvements
- Increased overall test coverage from 93% to 96% (+3%)
- Added 242 new tests (2,983 → 3,225 total tests)
- Test breakdown: 2,980 unit + 165 integration + 80 e2e tests
- Files improved to >95% coverage:
protocols.py: 65% → 100%dependency_installer.py: 78% → 100%readme_generator.py: 80% → 100%git_context.py: 83% → 98%environment_validator.py: 85% → 100%initial_commit.py: 85% → 100%env_generator.py: 86% → 100%git_hooks_installer.py: 89% → 96%template_installer.py: 89% → 100%gitignore_updater.py: 90% → 99%extractor.py: 85% → 96%curator.py: 86% → 97%archiver.py: 90% → 100%documentation_loader.py: 90% → 100%spec_validator.py: 86% → 100%updater.py: 86% → 98%repository.py: 91% → 100%tree.py: 85% → 98%integration_runner.py: 86% → 100%performance.py: 87% → 99%formatter.py: 88% → 97%query.py: 88% → 95%quality/checkers/base.py: 89% → 100%quality/reporters/base.py: 86% → 100%
- Created 4 new test files:
tests/unit/quality/checkers/test_base.pytests/unit/session/briefing/test_documentation_loader.pytests/unit/session/briefing/test_formatter.pytests/unit/session/briefing/test_git_context.py
- All quality checks passing: ruff, mypy, formatting
-
Three-File Documentation Model for Project Initialization
- Implemented comprehensive documentation structure with distinct purposes:
README.md: Quick start guide (generated, project-specific)ARCHITECTURE.md: Technical documentation (static template, comprehensive)CLAUDE.md: AI guidance for Claude Code (generated from template)
- Created
ARCHITECTURE.mdfiles for all 4 stacks with comprehensive technical documentation:- Architecture decisions with rationale and trade-offs
- Code patterns and examples
- Project structure explanations
- Database workflows
- Troubleshooting guides
- Created
CLAUDE.md.templatefiles for all 4 stacks with:- Stack-specific architecture rules and patterns
- Comprehensive Solokit command usage guide
- Claude behavior guidelines
- Work item management instructions
- Session workflow documentation
- Learning capture best practices
- Stack-specific anti-patterns and common mistakes
- Added
claude_md_generator.pymodule for CLAUDE.md generation - Integrated CLAUDE.md generation into
sk initworkflow (Step 6) - Updated orchestrator step numbering to be sequential (1-19)
- Affects: All stacks (saas_t3, ml_ai_fastapi, dashboard_refine, fullstack_nextjs)
- Implemented comprehensive documentation structure with distinct purposes:
Changed
-
README Generator Improvements
- Implemented cumulative quality gates (each tier includes all previous tiers' requirements)
- Added stack-aware quality gates from template registry:
- JavaScript stacks: E2E tests (Playwright), Bundle analysis, Lighthouse CI
- Python stacks: Load testing (Locust), API documentation (OpenAPI), Performance profiling
- Fixed uvicorn command for Python stack:
uvicorn src.main:app --reload(wasmain:app) - Added environment setup section with
.env.localinstructions - Added database setup section (Prisma for npm stacks, Alembic for Python stacks)
- Improved additional options display using registry names and descriptions
- Added ARCHITECTURE.md reference section to all generated READMEs
- Affects: All stacks
-
Template Registry Enhancements
- Made quality gates stack-aware with
adds_jsandadds_pythonfields - Tier 3 quality gates now separate JavaScript-specific (Playwright) from Python-specific (Locust)
- Tier 4 quality gates now separate JavaScript-specific (Bundle analysis, Lighthouse) from Python-specific (OpenAPI, Performance profiling)
- Removed stack_specific field (replaced with cleaner adds_js/adds_python structure)
- Affects: template-registry.json
- Made quality gates stack-aware with
Fixed
-
CI Workflow Improvements for GitHub Actions
- Fixed Lighthouse CI Chrome sandbox issues on GitHub Actions runners
- Added
puppeteerLaunchArgswith--no-sandbox,--disable-dev-shm-usage,--disable-gpuflags - Simplified lighthouse npm script to just
lhci autorun - Added Playwright browser installation step to lighthouse workflow
- Added
- Fixed Gitleaks failing on initial commit (no parent to compare against)
- Added condition to skip secrets-scan on initial push events
- PRs always run secrets-scan correctly
- Fixed dependency-review-action failing when Dependency Graph not enabled
- Added conditional check via GitHub API before running dependency review
- Shows warning if Dependency Graph is not enabled instead of failing
- Removed custom CodeQL workflow to avoid conflicts with GitHub's default setup
- GitHub's default CodeQL setup is recommended (enable in repo Settings > Security)
- Added
/api/healthendpoint to all Next.js stacks for smoke tests - Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine)
- Fixed Lighthouse CI Chrome sandbox issues on GitHub Actions runners
-
Python Stack (ml_ai_fastapi) CI Workflow Fixes
- Upgraded FastAPI from 0.115.6 to 0.121.3 to resolve starlette version conflict
- Added
setuptools>=78.1.1to security dependencies to fix pip-audit vulnerability - Fixed vulture false positives by renaming unused parameters to
_loggerand_method_name - Fixed Pyright missing imports by installing all deps (dev + prod) in quality-check workflow
- Fixed detect-secrets to handle missing
.secrets.baselinegracefully - Fixed pip-audit to run without
--require-hashes(requirements.txt typically doesn't have hashes) - Added conditional Dependency Graph check for dependency-review-action
- Fixed cosmic-ray installation for mutation testing workflow
- Removed
.secrets.baselinefrom .gitignore (should be tracked in git) - Affects: ml_ai_fastapi stack
-
Playwright Browser Installation and System Dependencies
- Fixed Playwright browsers not launching on Linux due to missing system dependencies
- Added automatic
apt_pkgPython module fix for Ubuntu 22.04 (common symlink issue) - Added automatic
sudo npx playwright install-depsexecution on Linux duringsk init - Browser binaries are now properly installed AND system dependencies are configured
- Impact: E2E tests, A11y tests, and Lighthouse CI now work out-of-the-box on fresh Linux VMs
- Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine) at tier-3+
-
Lighthouse CI Chrome Detection
- Fixed "Chrome installation not found" error in Lighthouse CI on local/VM environments
- Added
scripts/lighthouse.shwrapper that auto-detects Chrome/Chromium location - On GitHub Actions: Uses pre-installed Chrome at
/usr/bin/google-chrome-stable - On local/VM: Falls back to Playwright's Chromium if system Chrome not found
- No longer requires manual Chrome installation for local development
- Affects: All Next.js stacks at tier-4-production
-
ESLint Configuration Deprecation Warning
- Fixed ".eslintignore file is no longer supported" warning in ESLint 9+
- Migrated all ignore patterns from
.eslintignoretoeslint.config.mjsignoresarray - Removed deprecated
.eslintignorefiles from all tier-1-essential templates - Added comprehensive ignore patterns: playwright-report, test-results, .stryker-tmp, .lighthouseci, etc.
- Affects: All Next.js stacks (fullstack_nextjs, saas_t3, dashboard_refine)
-
Template File Formatting
- Fixed Prettier formatting issues in template files
- Re-formatted all template files using project's
.prettierrcconfig (printWidth: 100) - Previously formatted with default Prettier settings causing format check failures
- Affects: All stacks
-
Test Script Lighthouse CI Support
- Added
lighthouse.ymlworkflow parsing to test_all_templates.py - Lighthouse CI checks now run for tier-4-production projects regardless of ci_cd option
- Fixed early return condition that skipped workflow checks when ci_cd option not selected
- Added
Added
-
README Documentation for E2E, A11y, and Lighthouse
- Added Accessibility Testing section to README for projects with
a11yoption - Added Lighthouse CI section to README for tier-4-production projects
- Documents that Playwright's Chromium is used automatically for Lighthouse
- Added Accessibility Testing section to README for projects with
-
New User VM Test Guide
- Added comprehensive testing guide:
analysis-docs/NEW_USER_VM_TEST_GUIDE.md - Step-by-step instructions for testing Solokit on fresh GCP VMs
- Automated test script:
scripts/test-new-user-experience.sh - Tests all 4 stacks with tier-4 and all options
- Added comprehensive testing guide:
-
Phase 4 Test Failures for Next.js Stacks
- Fixed fullstack_nextjs mutation test failures by creating tier-specific Jest environment configurations
- Added tier-3 test file overrides using
@stryker-mutator/jest-runner/jest-env/nodefor API route tests - Kept tier-1/tier-2 test files with standard
@jest-environment nodefor compatibility - Fixed Lighthouse CI workflow placement: moved from a11y option to tier-4-production (where script exists)
- Created dedicated lighthouse.yml workflows in tier-4-production for all Next.js stacks
- Added PORT environment variable support to Playwright configs for parallel test execution
- Updated test script to provide DATABASE_URL environment variable for Next.js dev servers
- Optimized test timeouts: mutation tests (600s), regular tests (300s), default (120s)
- Impact: All 192 phase-4 tests now passing across all stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi)
- Affects: fullstack_nextjs, saas_t3, dashboard_refine (all tiers with ci_cd option)
- Files modified:
- Template files: 14 files (playwright configs, test files, workflow files)
- Test infrastructure: test_all_templates.py (environment and timeout configuration)
-
Tier-Aware CI/CD Workflows for ml_ai_fastapi
- Made CI/CD workflows respect tier-based tool availability to prevent false failures
- Added conditional execution for pylint duplicate code check (tier-3+ only) using
if: hashFiles('.pylintrc') != '' - Added conditional execution for cosmic-ray mutation tests (tier-3+ only) via config detection step
- Added conditional execution for Bandit security linting (tier-2+ only) via config detection step
- Updated test script to evaluate GitHub Actions
hashFiles()expressions and skip steps with failing conditions - Updated test script to skip conditional check steps (e.g., "Check if cosmic-ray config exists")
- Added
--stackoption to test script for running all phase-4 tests for a specific stack (48 tests per stack) - Impact: Prevents tier-1 and tier-2 tests from failing due to missing tier-3 quality tools
- Affects: ml_ai_fastapi (all tiers with ci_cd option)
- Fixed 24 out of 48 ml_ai_fastapi phase-4 test failures
Added
-
Code Duplication Detection for Python Stack (Session 4)
- Added pylint 3.3.3 with duplicate-code checking to ml_ai_fastapi tier-3 dependencies
- Created
.pylintrcconfiguration for code duplication thresholds - Added duplication check step to ml_ai_fastapi quality-check.yml workflow
- Updated stack-versions.yaml with pylint version and installation command
- Updated test script to recognize and run pylint commands
- Impact: All 4 stacks now have consistent code duplication detection at tier-3+
- JavaScript stacks: jscpd, Python stack: pylint
- Resolves: Session 4 (Code Duplication Detection) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Type Coverage Enforcement for Python Stack (Session 5)
- Added mypy type coverage check to ml_ai_fastapi quality-check.yml workflow
- Configured mypy with --disallow-untyped-defs and --disallow-incomplete-defs flags
- Updated test script to recognize and run mypy commands
- Impact: All 4 stacks now enforce type coverage at tier-3+
- JavaScript stacks: type-coverage tool (95%), Python stack: mypy strict checking
- Resolves: Session 5 (Type Coverage Enforcement) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Comprehensive Unit Tests for Python Stack (Session 6)
- Added 29 comprehensive unit tests achieving 94.54% coverage (up from 75.63%)
- Created test_api_routes.py: Tests for API endpoints and health checks with error scenarios
- Created test_database.py: Tests for database connections and dependency injection with mocking
- Created test_main.py: Tests for application startup, lifespan, and API documentation endpoints
- Coverage breakdown: Dependencies (100%), Database (100%), Main app (100%), Models (100%), Services (100%)
- Impact: Any coverage threshold (60%, 80%, 90%) selected during initialization will now pass
- Resolves: Session 6 (Unit Tests with Coverage) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Integration Tests Verification (Session 7)
- Verified all 4 stacks have integration tests properly configured in tier-3-comprehensive
- Confirmed integration test scripts in package.json/pytest for all stacks
- Verified test.yml workflows run integration tests without continue-on-error flags
- JavaScript stacks: Jest/Vitest with integration test directories
- Python stack: pytest with real HTTP client integration tests
- Impact: All stacks at tier-3+ have working integration tests in CI
- Resolves: Session 7 (Integration Tests) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Fixed
-
E2E Tests Failing in fullstack_nextjs (Session 8)
- Fixed PrismaClientInitializationError during E2E tests caused by database queries in Server Components
- Modified app/page.tsx to gracefully handle database connection errors with try-catch
- Added fallback data for E2E test environment when database is unavailable
- Fixed TypeScript type errors in lib/tests/prisma.test.ts using Object.defineProperty
- Impact: E2E tests now pass without requiring database setup, matching saas_t3/dashboard_refine patterns
- Root cause: fullstack_nextjs uses Server Components with direct Prisma queries (unlike other stacks)
- Affects: fullstack_nextjs template, all tiers
- Resolves: Session 8 (E2E Tests) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Mutation Testing Configuration Consistency (Session 9)
- Fixed ml_ai_fastapi mutation testing to match JavaScript stack tiering pattern
- Replaced mutmut with Cosmic Ray 8.4.3 (mutmut incompatible with src/ directory layouts)
- Removed [cosmic-ray] configuration from base, tier-1, and tier-2 templates
- Updated tier-3 and tier-4 pyproject.toml to use cosmic-ray==8.4.3 in quality dependencies
- Deleted obsolete mutmut_config.py.template from tier-3-comprehensive
- Updated CI/CD workflow to use cosmic-ray commands (cosmic-ray init/exec/cr-report)
- Updated stack-versions.yaml with cosmic-ray 8.4.3
- Verified cosmic-ray session creation (200 mutation jobs) on test projects
- Added tests pattern to Jest testMatch in 6 JavaScript configs for better test discovery
- Enhanced saas_t3 test coverage to 92.85% function coverage
- Impact: Mutation testing now introduced in tier-3, inherited by tier-4 (consistent across all stacks)
- Affects: ml_ai_fastapi (all tiers), saas_t3/dashboard_refine/fullstack_nextjs (tier-3/tier-4)
- Resolves: Session 9 (Mutation Testing) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Production Build Quality Gate (Session 10)
- Added explicit production build step to quality-check.yml for all 3 Next.js stacks
- Production build now runs as final quality gate before PR merge
- Updated test script to handle build steps context-aware (skip in setup jobs, run in quality jobs)
- Enhanced npm command parsing to handle all npm commands (not just npm run)
- Impact: Build failures now caught during quality checks, preventing broken production builds
- Affects: saas_t3, dashboard_refine, fullstack_nextjs (all tiers with CI/CD option)
- Resolves: Session 10 (Production Build) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Bundle Analysis Integration (Session 11)
- Integrated @next/bundle-analyzer in next.config.ts for all tier-4 Next.js stacks
- Bundle analyzer enabled via ANALYZE=true environment variable
- Bundle analysis job added to build.yml workflow (uploads artifacts for 30 days)
- Updated test script to parse security.yml and build.yml workflows
- Impact: All tier-4 production templates now have bundle size monitoring in CI/CD
- Affects: saas_t3/tier-4-production, dashboard_refine/tier-4-production, fullstack_nextjs/tier-4-production
- Resolves: Session 11 (Bundle Analysis) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Security Scanning Enforcement (Session 12)
- Removed continue-on-error flags from all security checks across all 4 stacks
- JavaScript stacks: npm audit and dependency-review-action now fail CI on vulnerabilities
- Python stack: Bandit, pip-audit, and Semgrep now fail CI on security issues
- Fixed .bandit configuration syntax from INI/Python hybrid to proper YAML format
- Added .eslintignore files to all JavaScript stacks to exclude generated report files
- Updated .prettierignore to include report/ directory in all JavaScript stacks
- Removed duplicate quality checks (type check, lint) from build.yml in all Next.js stacks
- Added security.yml workflow to test script parsing
- Impact: Security vulnerabilities now block CI/CD pipeline instead of being warnings
- Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), all tiers with CI/CD
- Resolves: Session 12 (Security Scanning) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Accessibility Testing Infrastructure (Session 13)
- Tagged all accessibility tests with @a11y marker for proper test discovery
- saas_t3: 1 test tagged (home.spec.ts)
- fullstack_nextjs: 1 test tagged (flow.spec.ts)
- dashboard_refine: 3 tests tagged (dashboard.spec.ts, user-management.spec.ts)
- npm run test:a11y now correctly finds and runs all 5 accessibility tests
- Tests use @axe-core/playwright to scan for WCAG 2.0/2.1 Level A & AA violations
- Impact: Accessibility testing workflow now functional for all Next.js stacks
- Affects: saas_t3, dashboard_refine, fullstack_nextjs (tier-3+, a11y option)
- Resolves: Session 13 (Accessibility Testing) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Lighthouse CI Configuration & Build Workflow Optimization (Session 14)
- Removed overly strict "lighthouse:recommended" preset from Lighthouse configuration
- Now uses explicit assertions for meaningful metrics: 90% category scores + Core Web Vitals
- Increased LCP threshold for dashboard_refine to 3500ms (accounts for Refine framework overhead)
- Removed duplicate build job from build.yml workflow in all Next.js stacks
- Simplified build.yml to only bundle-analysis job (builds independently with ANALYZE=true)
- Added Prisma client generation step to quality-check.yml for saas_t3 and fullstack_nextjs
- Production build now solely in quality-check.yml as proper quality gate
- Impact: Cleaner CI/CD workflows, no redundant builds, Lighthouse focuses on practical metrics
- Affects: saas_t3, dashboard_refine, fullstack_nextjs (tier-4-production, a11y option)
- Resolves: Session 14 (Lighthouse CI) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Complexity Analysis Enforcement (Session 15)
- Added ESLint complexity rules to all JavaScript stack tier-3 and tier-4 configs
- JavaScript complexity rules: cyclomatic (max 10), max-depth (4), max-nested-callbacks (4), max-lines-per-function (100)
- Test files exempted from max-nested-callbacks and max-lines-per-function (describe/it blocks naturally nest deeply)
- Updated Python Radon check to enforce thresholds:
radon cc --max Bfails build if complexity exceeds grade B - ESLint handles both linting and complexity for JavaScript (industry standard approach)
- Radon provides separate complexity analysis for Python (distinct from ruff linting)
- Impact: All stacks at tier-3+ now enforce code complexity standards to maintain readability
- JavaScript: Complexity violations show as ESLint errors during
npm run lint - Python: Complexity violations fail
radon ccstep in quality-check.yml workflow - Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), tier-3+
- Resolves: Session 15 (Complexity Analysis) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Dead Code Detection (Session 16)
- Added ts-prune check to all JavaScript stack quality-check.yml workflows
- JavaScript:
npm run check:unusedruns ts-prune to detect unused exports - ts-prune already installed in tier-3 package.json (version 0.10.3) with check:unused script
- Updated Python Vulture configuration to reduce false positives
- Increased Vulture min-confidence from 80 to 90 (fewer false positives)
- Added explicit excludes for tests, pycache, and alembic/versions
- Added ignore-names for common protocol parameters (method_name, logger) required by frameworks
- Fixed test script to use shlex.split() instead of str.split() for proper handling of quoted arguments
- Impact: All stacks at tier-3+ now detect and prevent dead/unused code
- JavaScript: ts-prune detects unused exports during quality checks
- Python: Vulture detects unused code in src/ with 90% confidence threshold
- Affects: All 4 stacks (saas_t3, dashboard_refine, fullstack_nextjs, ml_ai_fastapi), tier-3+
- Resolves: Session 16 (Dead Code Detection) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
-
Template Registry Documentation Update (Session 18)
- Updated template-registry.json to accurately reflect all implemented quality checks
- Tier-3 comprehensive now documents all tools: ts-prune, jscpd, vulture, pylint, Radon, cosmic-ray
- Tier-4 production now includes: Bundle analysis, Lighthouse CI, structured logging
- Added stack_specific sections to clarify JavaScript vs Python tooling differences
- Updated tier-3 to specify "E2E tests (Playwright for JS stacks)" - Python stack uses integration tests
- Updated tier-4 to document actual features: Bundle analysis (@next/bundle-analyzer), Lighthouse CI (90% scores)
- Changed tier-4 description from "Operations + Deployment" to "Operations + Monitoring + Performance"
- Updated metadata last_updated date to 2025-11-19
- Impact: Documentation now matches implementation, users know exactly what tools are used at each tier
- Affects: template-registry.json (user-facing documentation)
- Resolves: Session 18 (Update Template Registry Documentation) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
Fixed (from previous sessions)
-
Template Type Check Failures Across All Stacks
- Fixed dashboard_refine: Refine v5 Pagination API changed from
currenttocurrentPage- Updated
lib/__tests__/refine.test.tsxto use correct Pagination interface
- Updated
- Fixed fullstack_nextjs: TypeScript read-only property errors in test environment setup
- Updated
lib/__tests__/prisma.test.tsto use type assertions forprocess.env.NODE_ENV
- Updated
- Fixed ml_ai_fastapi: Pyright unable to resolve imports (missing venv configuration)
- Added
venvPathandvenvsettings topyrightconfig.json
- Added
- Fixed import ordering issues in ml_ai_fastapi template files (13 files)
- Impact: All CI/CD type checks now pass for tier-4-production across all stacks
- Affects: All templates (dashboard_refine, fullstack_nextjs, saas_t3, ml_ai_fastapi), all tiers
- Resolves: Session 2 (Type Checking) from TEMPLATE_CONSISTENCY_AUDIT_PLAN.md
- Fixed dashboard_refine: Refine v5 Pagination API changed from
-
Pre-commit Option Conflict with Tier-2+ Templates
- Removed redundant "Pre-commit" additional option from initialization
- Tier-2+ templates already include Husky git hooks built-in (.husky/pre-commit)
- Python pre-commit framework option was creating duplicate hook systems
- Deleted all template
pre-commit/directories (saas_t3, ml_ai_fastapi, fullstack_nextjs, dashboard_refine) - Updated init.py to show only 3 additional options: CI/CD, Docker, Env Templates
- Updated template-registry.json and template_installer.py
- Updated all documentation and command files to reflect the change
- Updated all test files to remove pre_commit from test cases
- Impact: Cleaner user experience, no conflicting hook systems, tier-1 users can upgrade to tier-2 for git hooks
- Affects: All templates, all tiers
- Rationale: Tier-2+ already has Husky (90% of users), tier-1 users can upgrade to tier-2, advanced users can manually install Python pre-commit if needed
-
Critical Tier-3 and Tier-4 Test Suite Failures
- Fixed Jest configuration to exclude Playwright e2e tests from Jest runs (6 files)
- Added
testMatchto only run unit and integration tests - Added
testPathIgnorePatternsto exclude/tests/e2e/ - Added
transformIgnorePatternsfor ESM dependencies (superjson, @trpc) - Added
moduleNameMapperfor path aliases - Created dedicated
jest.config.tsfiles for tier-3 and tier-4 templates
- Added
- Fixed package.json test scripts to separate test types (6 files)
- Added
test:unit- Run unit tests only - Added
test:integration- Run integration tests only - Added
test:e2e- Run Playwright e2e tests only - Added
test:all- Run all test types sequentially - Updated tier-3 and tier-4 package.json templates for all stacks
- Added
- Replaced broken integration test examples with working placeholders (3 files)
- Removed ESM server imports that caused "Cannot use import statement outside a module" errors
- Added educational placeholder tests demonstrating proper structure
- Tests now pass immediately after project initialization
- Impact: All tier-3 and tier-4 projects now have working test suites out of the box
- Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Resolves: "TransformStream is not defined" and ESM import errors
- Fixed Jest configuration to exclude Playwright e2e tests from Jest runs (6 files)
Added
-
Urgent Flag for Single Immediate-Priority Work Items
- Added
--urgentflag tosk work-newcommand for marking work items that require immediate attention - Exclusive single-item constraint: only ONE work item can be urgent at a time
- User confirmation prompt when setting a new urgent item (clears existing urgent flag)
- Urgent items override all priority levels and dependency ordering
- Visual ⚠️ indicator in
sk work-listoutput for urgent items sk work-nextalways returns urgent items first, ignoring dependencies- Added
--clear-urgentflag tosk work-updatecommand for manual clearing - Auto-clear urgent flag when work item status changes to completed
- Added urgent status question to
/work-newslash command (interactive UI) - Backward compatible: work items without urgent field default to
false - Added 24 unit tests (repository, scheduler, updater) with 90%+ coverage
- Added 11 integration tests for end-to-end urgent workflow
- Updated command documentation: work-new.md, work-update.md, work-list.md, work-next.md
- Added
-
Essential CLI Commands: help, version, doctor, config show
- Added
sk helpcommand to display all commands organized by categorysk help <command>shows detailed help for specific commands with usage, options, and examples- Global
--helpand-hflags supported - Created
src/solokit/commands/help.pywith comprehensive command documentation
- Added
sk versioncommand to display version information- Shows Solokit version, Python version, and platform
- Global
--versionand-Vflags supported - Created
src/solokit/commands/version.py
- Added
sk doctorcommand for comprehensive system diagnostics- Checks Python version (>= 3.11.0), git installation, project structure
- Validates config.json and work_items.json integrity
- Verifies quality tools availability (pytest, ruff)
- Provides actionable suggestions for failed checks
- Returns exit code 0 if all pass, 1 if any fail
- Created
src/solokit/commands/doctor.py
- Added
sk config showcommand to display configuration- Shows config file path and formatted configuration
- Validates configuration and displays status
--jsonflag for machine-readable output- Created
src/solokit/commands/config.py
- Updated CLI routing in
src/solokit/cli.pyto support new commands - Running
skwith no arguments now shows help (instead of error) - Added 27 new unit tests for all commands (100% passing)
- Updated README.md with utility commands section
- Updated docs/guides/troubleshooting.md to reference
sk doctoras first troubleshooting step
- Added
Fixed
-
High: Urgent Flag Not Cleared on Session Completion
- Fixed urgent flag persisting on completed work items when using
sk endcommand- Session completion now uses WorkItemUpdater instead of direct JSON manipulation
- Ensures urgent flag is automatically cleared when work item status changes to completed
- Behavior now consistent with
sk work-update <id> --status completed - Updated
src/solokit/session/complete.pyto use repository pattern for status updates
- Added
--set-urgentflag tosk work-updatecommand for setting urgent status- Allows promoting existing work items to urgent status
- Automatically clears urgent flag from other items (single-item constraint)
- Complements existing
--clear-urgentflag for complete CLI control - Updated
src/solokit/work_items/updater.pywith set_urgent field handling
- Updated help documentation for urgent flags
- Added
--set-urgentand--clear-urgentto work-update command help - Added
--urgentflag to work-new command help examples - Updated
src/solokit/commands/help.pywith complete option descriptions - Updated
.claude/commands/work-update.mdand template version
- Added
- Added integration test for urgent flag clearing on session completion
- New test:
test_auto_clear_urgent_on_session_completionin test_urgent_workflow.py - Verifies end-to-end workflow with session completion
- All 12 urgent workflow integration tests passing
- New test:
- Impact: Work lists now correctly show/hide ⚠️ symbol based on actual urgent status
- Users no longer need manual cleanup after completing urgent work items
- Complete CLI support for urgent flag lifecycle (create, set, clear, auto-clear)
- Fixed urgent flag persisting on completed work items when using
-
Critical: Next.js 16 Template Initialization Issues
- Fixed missing ts-node dependency causing Jest to fail parsing TypeScript config files
- Added
"ts-node": "10.9.2"to devDependencies in all 15 Next.js package.json templates - Affects all 3 Next.js templates (saas_t3, fullstack_nextjs, dashboard_refine) × 5 tiers each
- Resolves error:
Jest: 'ts-node' is required for the TypeScript configuration files
- Added
- Fixed deprecated
next lintcommand removed in Next.js 16- Changed
"lint": "next lint"to"lint": "eslint ."in all 15 templates - Updated
"lint:fix"script in dashboard_refine templates to use direct ESLint - Resolves cryptic error:
Invalid project directory provided, no such directory: .../lint
- Changed
- Fixed ESLint 9 incompatibility with legacy config format
- Replaced
.eslintrc.json(legacy) witheslint.config.mjs(flat config) in all 3 templates - Added
"globals": "16.5.0"package to tier-1-essential in all 3 templates - Configured proper globals for Node.js, browser, React, and Jest environments
- Resolves error:
ESLint couldn't find an eslint.config.(js|mjs|cjs) file
- Replaced
- Fixed linting validation being skipped during quality gates check
- Updated
src/solokit/init/session_structure.pyto include linting commands in quality gates config - Added
commandssection with language-specific linting commands (python, javascript, typescript) - Validation now properly runs
npm run lintinstead of reporting "no command for typescript"
- Updated
- Fixed linting errors in template example code
- Removed unused
ctxparameters from tRPC example router (saas_t3 template) - Template code now passes linting without errors after initialization
- Removed unused
- Impact: All 3 Next.js templates now work correctly across all quality tiers (base through tier-4)
- Users can successfully initialize projects without manual workarounds
- Quality gates validation (
/validate,/end) now properly check linting instead of skipping - Linting works out-of-the-box with ESLint 9 flat config
- All 2,936 tests passing with zero regressions
- Fixed missing ts-node dependency causing Jest to fail parsing TypeScript config files
-
Critical: CI/CD Workflow Failures in Template Projects
- Fixed CodeQL permission error causing Security workflow to fail on push to main
- Added
actions: readpermission to CodeQL jobs in all template security.yml files - Resolves error: "Resource not accessible by integration" when accessing workflow metadata
- Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Added
- Fixed CodeQL and secrets-scan jobs running on pull requests without required permissions
- Added
if: github.event_name != 'pull_request'conditional to skip on PRs - These jobs require write permissions not available in PRs from forks
- Prevents workflow failures on external contributions
- Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Added
- Fixed dependency-review failing on repositories without GitHub Advanced Security
- Added
continue-on-error: trueto dependency-review step - Allows workflow to pass even when Advanced Security is not available (free repositories)
- Resolves error: "Dependency review is not supported on this repository"
- Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Added
- Fixed Deploy workflow failures when production secrets are not configured
- Added conditionals to skip deployment steps when secrets are empty/missing
- Database migrations:
if: ${{ secrets.DATABASE_URL != '' }} - Vercel deployment:
if: ${{ secrets.VERCEL_TOKEN != '' }} - Sentry releases:
if: ${{ secrets.SENTRY_AUTH_TOKEN != '' }} - Lighthouse CI:
if: ${{ secrets.LHCI_GITHUB_APP_TOKEN != '' }} - Python templates: STAGING_DATABASE_URL, RAILWAY_TOKEN, DOCKER_REGISTRY, DEPLOY_KEY
- Affects: saas_t3, fullstack_nextjs, dashboard_refine, ml_ai_fastapi templates
- Fixed missing npm script errors in test and build workflows
- Changed to
npm run --if-present test:integrationfor integration tests - Changed to
npm run --if-present test:e2efor E2E tests - Changed to
npm run --if-present analyzefor bundle analysis - Scripts gracefully skip if not defined in package.json (tier-1/tier-2 projects)
- Resolves errors: "Missing script: test:integration/test:e2e/analyze"
- Affects: saas_t3, fullstack_nextjs, dashboard_refine templates
- Changed to
- Impact: New projects can now merge PRs without CI failures
- All CI workflows pass on tier-1-essential projects (base configuration)
- Deploy workflows gracefully skip steps when production infrastructure isn't configured yet
- Users can set up production secrets and advanced test suites incrementally without errors
- Fixed 11 workflow files across 4 templates (security.yml, deploy.yml, test.yml, build.yml)
- Fixed CodeQL permission error causing Security workflow to fail on push to main
-
Critical: Phase 2 Terminal Testing - Final 11 UX Issues (All 18 Issues Now Complete)
- Fixed
.session/directory causing uncommitted changes warnings (#9 - Critical)- Added
.session/to .gitignore in all 4 stack templates (saas_t3, ml_ai_fastapi, dashboard_refine, fullstack_nextjs) - Templates now properly exclude session tracking from git by default
- Added
- Fixed DOT syntax error in work-graph SVG generation (#4/#5 - Critical)
- Changed from invalid
"bold, color=red"to valid DOT syntax'style=bold, color=red' - Updated
src/solokit/visualization/dependency_graph.py:169 - SVG graph generation now works correctly with Graphviz
- Changed from invalid
- Changed uncommitted changes from ERROR to INFO level in sk start (#8 - High)
- Updated
src/solokit/session/briefing/git_context.pyto handle WorkingDirNotCleanError gracefully - Users no longer see ERROR logs for normal uncommitted changes during development
- Updated
- Added progress messaging and Claude Code promotion to sk init (#1 - High)
- Added initial progress message during initialization
- Changed final messages to use
output.info()instead oflogger.info()for visibility - Updated
src/solokit/init/orchestrator.pywith user-facing completion summary
- Added warning when dependency already exists in work-update (#2 - Medium)
- Shows
output.warning("Dependency 'X' already exists (skipped)")instead of silently skipping - Updated
src/solokit/work_items/updater.py
- Shows
- Replaced verbose output with compact table format in work-next (#6 - Medium)
- New table shows ID, Type, Priority, Status, Blocks, and Title columns
- Displays top 5 ready items and top 3 blocked items
- Arrow (→) marks recommended item, updated
src/solokit/work_items/scheduler.py
- Added interactive prompt to work-delete when no flags provided (#12 - Medium)
- Users now get choices: 1=keep spec, 2=delete spec, 3=cancel
- No longer requires --keep-spec or --delete-spec flags (but still accepts them)
- Updated
src/solokit/work_items/delete.pywith user-friendly menu
- Removed redundant ERROR/WARNING logs in edge cases (#14/#15/#16 - Medium)
- Removed duplicate logging before user-facing error messages
- Updated
query.py,updater.py, anddelete.pyto avoid log duplication - Changed "No changes made" to "No changes to update" for clarity
- Updated work-graph to use HelpfulArgumentParser for better errors (#17 - Medium)
- Invalid format errors now show examples instead of raw argparse output
- Updated
src/solokit/visualization/dependency_graph.py
- Improved "no results" message in learn-search (#11 - Low)
- Now suggests trying different keywords or browsing all learnings
- Updated
src/solokit/learning/reporter.py
- Added validation for empty query in learn-search (#18 - Low)
- Shows error with examples when query is empty or whitespace-only
- Updated
src/solokit/learning/curator.py
- Test updates: Fixed 1 test in
test_briefing_generator.pyto match new git status message - All 2,388 tests passing with zero regressions
- Quality checks: All ruff linting passed, all formatting compliant, all mypy checks passed
- Impact: Completes all 18 Phase 2 terminal testing issues for professional CLI UX
- Fixed
-
Critical: Phase 2 Terminal Testing - Clean Output, Archiver Fix, and Briefing Improvements
- Fixed log leakage issue where INFO/WARNING/ERROR logs appeared in all commands without --verbose flag
- Changed default CLI log level from INFO to ERROR for clean terminal output
- Removed redundant logging configuration from
validate.py - Updated
src/solokit/cli.pyto set ERROR level by default, DEBUG with --verbose - Only ERROR and above messages shown to users unless explicitly requesting verbose mode
- Fixed archiver type comparison error causing learning curation to fail
- Updated
src/solokit/learning/archiver.pyto handle new session dict format - Changed from comparing dict objects directly to extracting
session_numfield first - Resolves
'>' not supported between instances of 'dict' and 'int'error
- Updated
- Fixed work-list count logic to include blocked items in not_started category
- Updated
src/solokit/work_items/query.pyto count items by actual status - Blocked is now correctly treated as a property, not a separate status
- Count math now accurate: total = in_progress + not_started + completed
- Updated
- Added template comment stripping to briefing output for cleaner specs
- Created
strip_template_comments()method insrc/solokit/session/briefing/formatter.py - Removes HTML comments, placeholder text, and excessive blank lines from specs
- Briefings now ~5x shorter and more readable without template cruft
- Created
- Verified work-graph documentation already matches implementation (ascii, dot, svg formats)
- Added comprehensive regression test suite:
tests/integration/test_phase_2_terminal_fixes.py- 15 new tests covering all 5 issues
- Updated 5 existing tests to use new session dict format
- All 2,388 tests passing with zero regressions
- Quality checks: All ruff linting passed, all formatting compliant, all mypy checks passed
- Impact: Resolves 5 critical Phase 2 terminal testing issues for professional CLI UX
- Fixed log leakage issue where INFO/WARNING/ERROR logs appeared in all commands without --verbose flag
-
Critical: Phase 1 Terminal Testing - Error Messaging & UX Improvements
- Fixed missing
jsonschema>=4.20.0dependency causing all learning commands to fail - Enhanced argparse error messages with helpful examples and next steps:
- Created
src/solokit/core/argparse_helpers.pywithHelpfulArgumentParserclass - Updated
sk work-new,sk work-show,sk work-update,sk work-deletewith example-rich epilogs - All argparse errors now show full help text with examples instead of raw usage
- Created
- Improved Python binary detection for cross-platform compatibility:
- Created
src/solokit/core/system_utils.pywithget_python_binary()function - Updated
get_metadata.py,get_next_recommendations.py,get_dependencies.pyto detect python vs python3 - Error messages now show correct binary based on system availability
- Created
- Added
--debugflag tosk validateto hide stack traces from end users by default - Implemented context-aware "no work item" error messages:
sk start: Differentiates between "no items exist" vs "items exist but blocked"sk status: Shows total item count and actionable next stepssk end: Provides complete workflow guidance instead of "Work item not found: None"sk work-next: Helpful creation steps instead of generic "No work items found."sk work-list: Better message instead of wrong command reference "/work-item create"sk work-graph: Context-aware message differentiating no items vs filtered results
- All error messages now include:
- Numbered action steps for both terminal (
skcommands) and Claude Code (slash commands) - Emoji hints (⚠️, 💡) for visual guidance
- Specific next steps instead of generic warnings
- Numbered action steps for both terminal (
- Test updates: Fixed 1 test in
test_status.pyto match improved error messages - All 2,155 unit tests passing
- Impact: Resolves 13 out of 19 Phase 1 terminal testing issues
- Fixed missing
Added
- Feature: UX Enhancements - Logger Shortening, Interactive Prompts, and Claude Code Promotion
- Shortened logger names for better terminal readability (e.g., "orchestrator" vs "solokit.init.orchestrator")
- Added
questionarylibrary for rich interactive CLI prompts with styled UI components - Created
src/solokit/core/cli_prompts.pyutility module with 4 reusable functions:confirm_action(): Styled confirmation prompts with default fallbackselect_from_list(): Single-select lists with arrow key navigationmulti_select_list(): Multi-select checkboxes for multiple optionstext_input(): Text input with optional validation and defaults
- Replaced basic
input()calls insrc/solokit/project/init.pywith questionary prompts:- Template selection now uses interactive list selection
- Quality tier selection with rich descriptions
- Coverage target selection with visual list
- Additional options use multi-select checkboxes
- Final confirmation with styled yes/no prompt
- Added Claude Code promotion to initialization completion:
- Prominent messaging after
sk initcompletes - Lists key slash commands (/start, /end, /work-new, /work-list)
- Includes link to https://claude.com/claude-code
- Better flow: Claude Code promotion → Next Steps
- Prominent messaging after
- Enhanced README.md with Claude Code positioning:
- Added "💡 Best Used with Claude Code" hero section with Quick Start variant
- Enhanced Prerequisites to strongly recommend Claude Code (not just required)
- Added "vs. Using Claude Code Standalone" comparison explaining workflow benefits
- Repositioned documentation to emphasize Claude Code as primary interface
- All prompts gracefully fall back to defaults in non-interactive environments (CI/CD, piped stdin)
- Added EOF/KeyboardInterrupt error handling for robust test execution
- Test suite: 2,373 tests passing (added 17 new tests for cli_prompts module)
- Quality: All ruff linting passed, all mypy checks passed with modern type annotations
Fixed
- Quality: Complete code quality and test suite cleanup
- Fixed all linting issues: Replaced deprecated
typing.Listwith built-inlisttype in 3 template files - Fixed all mypy type errors (17 errors across 6 files):
- Updated
pyproject.toml: Replaced deprecatedstrict_concatenatewithextra_checks - Fixed
exceptions.py: Changed implicit Optionalreturncode: int = Noneto explicitint | None = None - Added type casting in
template_installer.pyanddependency_installer.pyforjson.load()andyaml.safe_load()returns - Enhanced return type in
environment_validator.py:dict[str, bool | str]→dict[str, bool | str | None | list[str]] - Added Literal type casting in
orchestrator.pyfor stack_type and tier parameters
- Updated
- Fixed test failures (3 tests):
- Fixed mock fixtures: Changed
exit_codetoreturncodein 6 test mocks - Updated
conftest.py: Aligned mock_stack_versions with actual stack-versions.yaml structure (base, tier1-4 instead of all_tiers/tier4)
- Fixed mock fixtures: Changed
- Removed all legacy init tests (12 tests deleted):
- Deleted
TestGitignoreGenerationclass (8 tests) fromtest_init_workflow.py - Deleted
TestGitInitializationclass (3 tests) fromtest_init_workflow.py - Deleted
TestCompleteInitWorkflowtest (1 test) fromtest_init_workflow.py
- Deleted
- Fixed E2E test fixtures to avoid legacy init (25 tests un-skipped):
- Updated fixtures in
test_core_session_workflow.py,test_learning_system.py,test_work_item_system.py - Fixtures now manually create
.sessiondirectory structure instead of calling deprecatedsk init - Added all required tracking files with proper structure (work_items.json, learnings.json, status_update.json, stack.txt, tree.txt)
- Updated fixtures in
- Test suite results: 2,954 tests passing, 0 failed, 0 skipped (previously 2,368 passing, 35 skipped)
- Quality checks: All ruff linting passed, all 247 files formatted, all mypy checks passed (106 source files)
- Benefits: Clean codebase with modern Python type hints, zero legacy code, 100% test success rate
- Fixed all linting issues: Replaced deprecated
Added
- Feature: Claude Code Interactive UI Integration
- Integrated Claude Code's
AskUserQuestiontool to replace Python's interactive terminal prompts with rich UI components - Updated 6 slash commands with interactive workflows:
/work-new: Interactive dependency and metadata selection with AI-powered suggestions/work-update: Multi-select field updates (status, priority, milestone, dependencies)/work-delete: Shows dependent work items with warning before deletion/end: Work item completion status selection (completed/in-progress/cancel)/learn: AI-generated learning suggestions with multi-select capture/start: Interactive work item recommendations (top 4 ready items by priority)
- Created 4 optimization scripts to avoid reading full JSON files:
get_metadata.py: Fast work item metadata retrieval (~10 lines vs 1,751 lines)get_dependencies.py: Quick dependency lookup with filtering and statusget_dependents.py: Find work items that depend on a given itemget_next_recommendations.py: Get top N ready work items by priority
- Removed all Python
input()calls from command modules (creator.py, updater.py, delete.py, complete.py) - All commands now require explicit CLI arguments with no interactive fallbacks
- Updated command files (
.claude/commands/*.md) with declarative AskUserQuestion workflows - Added 53 comprehensive unit tests for optimization scripts
- All 2,226 tests passing (1,996 unit + 140 integration + 90 e2e)
- Full type safety maintained with mypy strict mode
- Benefits: Rich interactive UI for Claude Code users, better UX with multi-select options, AI-generated suggestions, optimized performance
- Integrated Claude Code's
Changed
- Session Completion:
/sk:endnow defaults to marking work items as completed- Non-interactive mode (e.g., when run by Claude Code) now defaults to marking work items as "completed" instead of "in-progress"
- This aligns with the most common use case where developers end sessions after completing their work
- Use the
--incompleteflag explicitly to keep work items as "in-progress" for multi-session work - Interactive mode behavior unchanged (still defaults to completed as choice 1)
- Updated
src/solokit/session/complete.py:943to returnTruein non-interactive mode - Updated documentation in
.claude/commands/end.mdto reflect new default behavior - Updated test
test_prompt_non_interactive_defaults_trueintests/unit/session/test_complete.py
Added
- Performance: Comprehensive optimization for session operations
- Created
src/solokit/core/cache.pywith thread-safe TTL-based caching:Cacheclass with get/set/invalidate/clear operationsFileCacheclass with automatic modification time tracking- Global cache instance accessible via
get_cache()
- Created
src/solokit/core/performance.pyfor performance monitoring:@measure_time()decorator for automatic function timingTimercontext manager for code block timing- Automatic logging for operations >100ms (info) and >1s (warning)
- Enhanced
src/solokit/learning/similarity.pywith caching optimizations:- Added
_word_cacheto cache word sets during merge operations - Pre-compute word sets once per category (O(n) instead of O(n²))
- Reduced similarity checking from 4,950 operations to ~100 for 100 learnings
- Added
- Enhanced
src/solokit/work_items/repository.pywith file caching:load_all()usesFileCachewith modification tracking- Eliminates 11+ repeated file loads per operation
save_all()automatically invalidates cache
- Added 91 comprehensive tests:
- 16 cache module tests (TTL, thread safety, file caching)
- 13 performance module tests (decorator, timer, exception handling)
- Enhanced similarity tests with word cache validation
- Performance improvements:
- Similarity checking: 30-50x faster for large learning datasets
- File I/O: 10x reduction with intelligent caching
- Automatic performance monitoring built-in across codebase
- All 1,980 unit tests passing, full type safety with mypy strict mode
- Created
Changed
-
Refactor: Extract constants and remove magic values - Complete centralization
- Created comprehensive
src/solokit/core/constants.pymodule with 31 constants organized into 9 categories - Replaced 50+ magic timeout values and hardcoded path strings across 27 files with named constants
- Added 8 helper functions for type-safe path construction (e.g.,
get_session_dir(),get_work_items_file()) - Organized constants into logical categories:
- Git operation timeouts (3): Quick/Standard/Long (5s/10s/30s)
- Quality gate timeouts (5): From 5s checks to 20min test runs
- Integration testing timeouts (5): Docker, fixtures, cleanup operations
- Session workflow timeouts (4): Status, completion, learning extraction
- Project initialization timeouts (3): Stack detection, tree/graph generation
- Performance testing (4): Regression thresholds, test timeouts
- Learning system (5): Curator settings, similarity thresholds
- Directory and file paths (11): Session directory structure
- Updated files across all major modules:
- Core: git/integration.py (13 replacements), session/validate.py
- Quality: All 8 checker modules + gates.py (22 replacements)
- Session: complete.py, status.py, briefing modules (8 replacements)
- Testing: performance.py, integration_runner.py (9 replacements)
- Other: learning, visualization, project modules (4 replacements)
- All constants use
Finaltype annotations for type safety - Benefits: Single source of truth, self-documenting code, easier maintenance, improved readability
- All 2,180 tests passing, zero linting issues, clean formatting
- Created comprehensive
-
Refactor: Complete logging consistency refactor - 100% migration to structured logging
- Migrated all 502 print() statements across 30 files to new structured logging/output system
- Separated user-facing output from diagnostic logging for better maintainability:
- Created
OutputHandlerclass insrc/solokit/core/output.pyfor user-facing messages (stdout/stderr) - Enhanced
logging_config.pywith structured logging, JSON formatting, and context management
- Created
- Migrated 21 additional files across 4 batches in Session 29:
- Batch 1 (100 statements):
reporter.py,dependency_graph.py,tree.py - Batch 2 (37 statements):
config_validator.py,cli.py,error_formatter.py,stack.py - Batch 3 (24 statements):
milestones.py,curator.py,repository.py, work_items stragglers - Batch 4 (38 statements):
env_validator.py,executor.py,performance.py,exceptions.py, and 4 others
- Batch 1 (100 statements):
- Fixed all migration issues:
- Corrected indentation errors and incomplete f-strings from automated migration
- Fixed variable shadowing bug in
dependency_graph.py(output vs graph_output) - Added missing
output = get_output()initialization in 8+ modules - Updated 45 tests to work with new output system instead of capturing stdout
- All 2,180 tests passing (100% pass rate) after migration
- Passed all quality gates: ruff linting, mypy type checking, code formatting
- Benefits: Cleaner separation of concerns, consistent user experience, better diagnostic logging, structured log output support
-
Refactor: Decompose manager.py god-class into modular architecture
- Decomposed monolithic 1,212-line
WorkItemManagergod-class into 8 focused, single-responsibility modules - Created 7 new specialized modules:
repository.py,creator.py,validator.py,query.py,updater.py,scheduler.py,milestones.py - Refactored main
manager.pyfrom 1,212 to 260 lines (-79% reduction) by delegating to specialized modules - Implemented dependency injection pattern with clear module responsibilities:
WorkItemRepository: Data access and persistence layer (CRUD operations) (235 lines)WorkItemCreator: Interactive and non-interactive work item creation with prompts (436 lines)WorkItemValidator: Validation logic for integration tests and deployments (197 lines)WorkItemQuery: Listing, filtering, searching, sorting, and display (389 lines)WorkItemUpdater: Update operations with field validation (211 lines)WorkItemScheduler: Work queue management and next item selection (176 lines)MilestoneManager: Milestone CRUD operations and progress tracking (133 lines)
- Created comprehensive test suite: 168 new unit tests for all new modules (213 tests total, up from 111)
- Added 4 new test files:
test_repository.py,test_creator.py,test_query.py,test_milestones.py - Updated
test_manager.pyto focus on integration testing of the orchestration layer (45 integration tests) - Fixed 4 mypy type annotation errors in repository.py for strict type checking compliance
- All 2,165 tests passing (100% pass rate) including 213 work_items module tests
- Maintained full backward compatibility with existing WorkItemManager public API
- Benefits: Single responsibility principle, improved testability, better code navigation, extensibility, loose coupling, easier maintenance
- Decomposed monolithic 1,212-line
-
Refactor: Decompose learning curator god-class into modular architecture
- Decomposed monolithic 1,226-line
LearningsCuratorgod-class into 8 focused, single-responsibility modules - Created 6 new specialized modules:
categorizer.py,archiver.py,extractor.py,repository.py,reporter.py,validator.py - Refactored main
curator.pyfrom 1,226 to 369 lines (-70% reduction) by delegating to specialized modules - Implemented dependency injection pattern with clear module responsibilities:
LearningCategorizer: Auto-categorization with keyword scoring (124 lines)LearningArchiver: Archive management for old learnings (116 lines)LearningExtractor: Extract from sessions, git commits, code comments (343 lines)LearningRepository: CRUD operations and data persistence (247 lines)LearningReporter: Reports, statistics, search, timeline (349 lines)LearningValidator: Validation logic and JSON schema (142 lines)
- Added 13 compatibility wrapper methods to maintain backward compatibility with existing tests
- Fixed
FileOperationErrorexception handling in extractor for graceful JSON parsing failures - All 2,143 tests passing (100% pass rate) including 212 learning-related tests
- Fixed all quality issues: ruff formatting (4 files), mypy type checking (2 errors)
- Benefits: Single responsibility principle, improved testability, better code navigation, extensibility, loose coupling
- Decomposed monolithic 1,226-line
-
Refactor: Complete Quality Gates modularization into specialized checker architecture
- Decomposed monolithic 1,370-line
gates.pygod class into 10 focused, single-responsibility checker classes - Created modular checker architecture with abstract
QualityCheckerbase class andCheckResultdataclass - Implemented 10 specialized checkers:
SecurityChecker,ExecutionChecker,LintingChecker,FormattingChecker,DocumentationChecker,SpecCompletenessChecker,CustomValidationChecker,Context7Checker,IntegrationChecker,DeploymentChecker - Refactored main
gates.pyfrom 1,370 to 611 lines (-55%) by delegating to specialized checkers - Removed legacy
gates_legacy.py(1,370 lines) after successfully migrating all functionality - Created reporter infrastructure:
ConsoleReporterandJSONReporterfor flexible output formatting - Added
ResultAggregatorfor combining and analyzing checker results - Implemented dependency injection pattern with optional
CommandRunnerparameter for fast, isolated testing - Created comprehensive test suite: 220 new unit tests for all checker modules (360 tests total, up from 140)
- Achieved 95%+ code coverage across all new modules (100% on 4 checkers, 94-99% on others)
- Fixed all quality issues: ruff linting (91 errors), black formatting (28 files), mypy type checking (27 errors)
- Renamed
TestRunnertoExecutionCheckerto avoid pytest collection warnings - Added configuration dataclasses:
Context7Config,IntegrationConfig,DeploymentConfig - All 360 tests passing (100% pass rate) with 0.40s execution time
- Maintained full backward compatibility with existing QualityGates interface
- Benefits: Single responsibility principle, easy to test, pluggable architecture, clear separation of concerns, type-safe, highly maintainable
- Decomposed monolithic 1,370-line
-
Refactor: Extract learning similarity engine into dedicated module
- Created new
src/solokit/learning/similarity.pymodule with reusable similarity detection algorithms - Implemented
JaccardContainmentSimilarityclass with configurable thresholds and stopword filtering - Implemented
LearningSimilarityEnginewith caching, pluggable algorithms, and Protocol-based design - Added comprehensive test suite (35 tests) covering similarity algorithms, caching, merging, and edge cases
- Refactored
LearningsCuratorto delegate similarity operations to the new engine - Removed duplicate similarity logic from curator (simplified 4 methods, removed 1 internal method)
- Fixed all ruff linting issues (14 deprecated typing imports converted to modern syntax)
- Achieved 100% mypy type checking compliance with proper type annotations
- All 1783 tests passing with no regressions
- Benefits: Better separation of concerns, improved testability, reusable similarity algorithms
- Created new
-
Refactor: Add comprehensive type hints across entire codebase
- Added complete type hint coverage to all 55 source files in the codebase (100% coverage)
- Fixed 348 mypy errors across 6 refactoring sessions, achieving 0 type checking errors
- Modernized type annotations: converted
Optional[X]toX | Nonesyntax (14 occurrences) - Added
from __future__ import annotationsto 12 modules for forward reference support - Fixed Priority enum comparison methods to accept
objectparameter for protocol compatibility - Fixed ErrorContext.exit() return type to
Literal[False]for strict context manager protocol - Added explicit return type annotations to 100+ functions including nested functions
- Added type annotations for complex variables:
dict[str, Any],list[dict[str, str]], etc. - Used
# type: ignore[no-any-return]for unavoidable Any returns from json.load() and yaml.safe_load() - Applied ruff auto-formatting to 8 files for consistent code style
- All 1520 unit tests passing with no regressions
- Benefits: IDE autocomplete, early error detection, better refactoring safety, improved documentation
Added
- Core Error Handling Infrastructure
- Implemented comprehensive SDDError exception hierarchy with 50+ specialized exception types
- Added ErrorCode enumeration with 40+ error codes for standardized error identification
- Added ErrorCategory system (SYSTEM, USER, VALIDATION, NETWORK) for error classification
- Implemented ErrorFormatter for consistent error display with exit code mapping
- Added error handling decorators (@log_errors, @convert_subprocess_errors, @convert_file_errors)
- Created structured logging integration with context preservation and exception chaining
- All exceptions include context dict, remediation guidance, and proper exit codes
Changed
- Standardized Error Handling Migration (Phases 1-3)
- Migrated 33 production files from print() and return tuples to structured exception-based error handling
- Phase 1 (11 files): Core utilities and briefing components
- Phase 2 (8 files): Work item management and validation
- Phase 3A (5 files): Core business logic (git/integration, quality/gates, learning/curator, session/complete, work_items/manager)
- Phase 3B (3 files): Testing infrastructure
- Phase 3C (6 files): Project management and configuration
- Replaced 200+ print() error statements with proper exception raising
- Replaced 26 return tuple patterns with exception-based error handling
- Replaced 8 sys.exit() calls in business logic with exceptions (CLI entry points preserved)
- Replaced 75+ broad Exception catches with specific exception types or catch-and-reraise pattern
- Added @log_errors() decorators to 40+ key functions for structured logging
- Updated 9 test files with pytest.raises() patterns and exception validation
- Quality gates intentionally kept 47 return tuples for result aggregation (not errors)
- All 1750 tests passing (100% coverage maintained)
Fixed
- Linting and Formatting
- Fixed 77 type annotation warnings (Optional[X] → X | None) using ruff --unsafe-fixes
- Added missing ValidationError import in session/briefing.py
- Formatted 31 files with ruff format for consistent code style
- All ruff checks passing with zero errors
Investigated
- Dataclass Migration Analysis
- Investigated replacing dictionary-based data structures with Python dataclasses across the codebase
- Analysis identified 1,260 dictionary patterns across 57 files requiring migration
- Estimated effort: 30-35 hours with high risk of introducing bugs
- Decision: Deferred indefinitely - current dict-based approach is stable and well-tested
- Rationale: Low ROI for a working CLI tool, prefer TypedDict for gradual type improvements
- All 1,471 tests passing (1,333 unit + 138 integration)
Changed
-
Refactor: Consolidate subprocess execution with CommandRunner
- Replaced all direct
subprocess.run()calls with centralizedCommandRunnerclass - Updated 10 production files to use
CommandRunnerfor consistent command execution:visualization/dependency_graph.py- Graphviz SVG generationsession/validate.py- Git status validationsession/status.py- Git diff operationssession/complete.py- Stack/tree updates and git operationslearning/curator.py- Git log extractiontesting/performance.py- wrk load testing and docker operationstesting/integration_runner.py- Docker-compose and test executionproject/tree.py- Tree command executionproject/stack.py- Language version detectionproject/init.py- Git init and dependency installation
- Updated 9 test files with proper
CommandRunnermocking patterns usingCommandResultobjects - Benefits: consistent error handling, timeout management, retry logic, and centralized logging
- Fixed pytest collection warning by renaming
TestExecutionConfigtoExecutionConfig - All 1,563 tests passing with zero warnings
- Replaced all direct
-
Refactor: Decompose briefing.py god-class into modular package
- Decomposed monolithic 1,166-line
session/briefing.pyinto focused package structure with 9 modules - Created
session/briefing/package with single-responsibility modules averaging ~150 lines each:orchestrator.py- SessionBriefing class for coordinating componentswork_item_loader.py- WorkItemLoader for loading and resolving work itemslearning_loader.py- LearningLoader for loading and scoring relevant learningsdocumentation_loader.py- DocumentationLoader for project docs discoverystack_detector.py- StackDetector for technology stack detectiontree_generator.py- TreeGenerator for directory tree loadinggit_context.py- GitContext for git status and branch operationsmilestone_builder.py- MilestoneBuilder for milestone contextformatter.py- BriefingFormatter for text formatting and generation
- 100% backward compatibility maintained through wrapper functions in
__init__.py - Added
GitStatus.PR_CLOSEDandGitStatus.DELETEDenum values for complete git workflow states - Class-based API enables better testability, reusability, and dependency injection
- All 1,440 unit and integration tests passing with no regressions
- Created comprehensive migration guide in
docs/development/BRIEFING_REFACTOR_MIGRATION_GUIDE.md - Benefits: improved maintainability, testability, code organization, and extensibility
- Decomposed monolithic 1,166-line
-
Refactor: Replace magic strings with type-safe enums
- Created comprehensive enum system in
core/types.pywith 4 enums: WorkItemType, WorkItemStatus, Priority, GitStatus - Updated 12 modules to use type-safe enums instead of magic strings
- Priority enum supports comparison operations (<, >, <=, >=) for prioritization logic
- GitStatus enum updated to match actual workflow states (in_progress, ready_to_merge, ready_for_pr, pr_created, merged)
- All enums inherit from
strfor seamless JSON serialization compatibility - Each enum provides
.values()class method for validation and iteration - 100% backward compatibility maintained - no changes to JSON data formats
- All 1,532 tests passing with no regressions
- Created comprehensive documentation in
docs/development/ENUM_USAGE_GUIDE.mdwith usage patterns, examples, and migration guide - Benefits: IDE autocomplete, type safety, easier refactoring, single source of truth for valid values
- Created comprehensive enum system in
-
Refactor: Centralized configuration management with ConfigManager
- Created
core/config.pywith singleton ConfigManager for centralized config loading - Type-safe dataclasses for all config sections (QualityGatesConfig, CurationConfig, GitConfig)
- Caching mechanism to avoid redundant file reads with invalidation support
- Refactored 5 modules to use ConfigManager:
quality/gates.py,git/integration.py,learning/curator.py,session/complete.py,session/validate.py - 21 comprehensive unit tests for ConfigManager with 98% coverage
- Fixed 8 previously skipped tests in test suite
- Removed 3 obsolete test classes (duplicate config loading tests)
- All 1256 unit tests pass (up from 1248) with 0 skipped tests
- Net reduction of 183 lines of code through deduplication
- Eliminated duplicate config loading logic across modules
- Created
-
Refactor: Consolidated JSON file I/O operations
- Centralized all JSON file operations in
core/file_ops.pywithJSONFileOperationsclass - Added
FileOperationErrorexception for consistent error handling - Enhanced features: atomic writes by default, optional validation hooks, automatic directory creation
- New
load_json_safe()method for guaranteed return (never raises) - Removed duplicate
_load_jsonand_save_jsonmethods fromlearning/curator.py - 97% test coverage with 41 comprehensive unit tests
- All 1240 unit tests pass with no regressions
- Eliminated ~100+ lines of duplicate code across codebase
- Created comprehensive API reference documentation in
docs/reference/file-operations-api.md - Updated architecture documentation
- Centralized all JSON file operations in
[0.1.0] - 2025-10-26
Note: Versions 0.6.0 and 0.7.0 were development versions that have been consolidated into the 0.1.x public release series.
Added
-
Enhanced session briefings with context continuity
- Previous Work section for in-progress items showing commits, file stats, and quality gates from prior sessions
- Enriched session summaries with full commit messages and file change statistics
- Enhanced learning relevance scoring using multi-factor algorithm (keywords, type, recency, category bonuses)
- Top 10 relevant learnings (up from 5) with intelligent scoring
- Fixes briefing update bug - briefings now regenerated for in-progress items
- Fixes timing issue - work_items data reloaded after recording commits to ensure accurate summaries
- Makes multi-session work practical by eliminating context loss
- 22 new comprehensive unit tests for helper functions and enhanced functionality
- Updated documentation in
.claude/commands/start.mdand.claude/commands/end.md
-
Work item deletion - Safe deletion of work items with dependency checking
- New
sk work-delete <work_item_id>command - Interactive mode with 3 options: keep spec, delete spec, or cancel
- Non-interactive mode with
--keep-specand--delete-specflags - Dependency checking warns about dependent work items
- Automatic metadata updates (total_items, status counts)
- 19 comprehensive unit tests
- Full documentation in
.claude/commands/work-delete.mdanddocs/commands/work-delete.md
- New
-
Work item completion status control - Explicit control over work item completion during session end
- Interactive 3-choice prompt: "Mark completed", "Keep in-progress", "Cancel"
- Command-line flags:
--completeand--incomplete - Supports multi-session workflows
- 8 unit tests added
-
PyPI Publishing Workflow - Automated package publishing to PyPI on GitHub releases
-
Comprehensive test infrastructure - Test suite reorganization and expansion
- 1,408 comprehensive tests (up from 183, 765% increase)
- 85% code coverage (up from 30%)
- Unit/integration/e2e structure across 35 test files
- 4 modules at 100% coverage, 20 modules at 75%+ coverage
-
Auto git initialization -
sk initnow automatically initializes git repository and creates initial commit -
Pre-flight commit check -
sk endvalidates all changes are committed before running quality gates -
CHANGELOG workflow improvements - Git hooks with reminders + smarter branch-level detection
-
OS-specific .gitignore patterns - macOS, Windows, and Linux patterns automatically added during
sk init
Changed
- BREAKING: Package structure migrated to standard Python src/ layout
- Moved all Python modules from flat directory to organized
src/solokit/package structure - Created domain-organized subdirectories:
core/,session/,work_items/,learning/,quality/,visualization/,git/,testing/,deployment/,project/ - Updated all imports from
scripts.Xtosolokit.Xpattern (43 files) - Removed all
sys.path.insert()hacks (38 instances) - Removed
setup.pyin favor of PEP 517/518 pyproject.toml-only configuration - CLI command remains
solokit(no user-facing changes) - All tests pass, PyPI-ready structure, better IDE support
- Moved all Python modules from flat directory to organized
- Simplified git branch naming - Branch names now use work item ID directly
- Format:
feature_oauthinstead ofsession-001-feature_oauth - Clearer intent, shorter names, backward compatible
- Format:
- Standardized spec validation - All work item types now use "Acceptance Criteria" section consistently
- Updated refactor specs to use "Acceptance Criteria" (was "Success Criteria")
- Makefile clean target - Now removes coverage artifacts (
htmlcov/,coverage.xml,coverage.json)
Fixed
- Quality gates test timeout - Increased from 5 to 10 minutes (1408 tests take ~6 minutes)
- Docstring validation - Fixed pydocstyle configuration to properly validate project docstrings
- Bug #25: Git branch status now finalizes when switching work items (12 unit tests)
- Bug #24:
/startcommand now properly handles explicit work item selection (3 unit tests) - Bug #23: Bug/refactor spec templates now include "Acceptance Criteria" section
- Bug #21: Learning curator no longer extracts test data strings (21 unit tests)
- Bug #20: Multi-line LEARNING statements now captured completely (30 unit tests)
- UX improvements: Auto git init, pre-flight checks, CHANGELOG reminders, clear error messages
Removed
- Deleted obsolete development tracking files (
NEXT_SESSION_PROMPT.md,TEST_PROGRESS.md) - Removed 38 instances of
sys.path.insert()manipulation - Removed flat directory structure
- Removed E402 ignore from ruff config
[0.5.8] - 2025-10-21
Added
- Marketplace Plugin Support: Solokit now works as a Claude Code marketplace plugin
- One-time setup command for plugin users:
pip install -e ~/.claude/plugins/marketplaces/claude-plugins/solokit - Simplified installation documentation with clear paths for both marketplace and direct installation
Changed
- Unified CLI: All 15 slash command files now use
solokitcommand instead of relative paths - Updated command files:
init.md,start.md,end.md,status.md,validate.md,learn*.md,work-*.md - Simplified README installation section with two clear options (marketplace vs. direct)
- Updated all CLI examples throughout documentation to use
solokitcommand - Updated marketplace README (
claude-plugins/README.md) with v0.5.8 installation instructions - Updated Architecture Notes to reflect v0.5.8 changes
Technical Details
- Files Modified: 18 files total
- 15 command files (
.claude/commands/*.md) - 1 main README (
README.md) - 1 marketplace README (in separate repo)
- 1 pyproject.toml (version bump)
- 15 command files (
- Breaking Changes: Command files no longer use relative Python paths - now use
solokitCLI - Migration: Users must run
pip install -e .if not already done
Migration Guide
For marketplace plugin users:
pip install -e ~/.claude/plugins/marketplaces/claude-plugins/solokit
For existing direct installations:
cd /path/to/solokit
pip install -e .
All slash commands will now work via the solokit CLI.
Benefits
- ✅ Plugin works from marketplace installation
- ✅ No need to clone Solokit into every project
- ✅ Cleaner, more standard approach
- ✅ Works identically whether installed directly or via marketplace
- ✅ Aligns with Python package best practices
Reference
See ROADMAP.md Phase 5.8 for complete details.
[0.5.7] - 2025-10-18
Added
- Spec-first architecture:
.session/specs/*.mdfiles are now the single source of truth for work item content - Comprehensive markdown parser (
spec_parser.py, 700+ lines) supporting all 6 work item types - Spec file validation system with required section checks and quality gates
- Complete context loading - removed all compression (50-line tree limit, 500-char doc limits)
- Writing guide (
docs/guides/writing-specs.md, 500+ lines) with examples for all work item types - Template structure documentation (
docs/reference/spec-template-structure.md)
Changed
- Eliminated dual storage problem - work item content now only in spec files, not
work_items.json - Enhanced all 6 spec templates with comprehensive examples and inline guidance
- Updated briefing system to load full spec content without truncation
- Refactored validators and runners to use spec parser
- Quality gates now validate spec completeness before session completion
Removed
- Content fields from
work_items.json(rationale, acceptance_criteria, implementation_paths, test_paths) - Compression limits on project documentation
- Duplicate briefing sections
Technical Details
- Tests Added: 49 tests across 6 test files
- Code Added: ~3,200 lines (spec_parser.py, spec_validator.py, templates, docs)
- Files Created: 8 new files (validator, docs, test files)
- Files Enhanced: 12 files (briefing_generator, quality_gates, templates, commands)
Reference
See ROADMAP.md Phase 5.7 for complete details.
[0.5.6] - 2025-10-15
Added
- Deployment work item type with comprehensive validation framework
- Deployment execution framework with pre-deployment validation and rollback automation
- Environment validation system with 7 validation types (connectivity, configuration, dependencies, health checks, monitoring, infrastructure, capacity)
- Deployment quality gates integrated with
quality_gates.py - Multi-environment support (staging vs production with different configurations)
- Automated smoke test execution with timeout and retry support
- Dry-run mode for deployment simulation
Changed
- Enhanced
deployment_spec.mdtemplate with 11 sections including deployment procedure, rollback, smoke tests - Session workflow now includes deployment-specific briefings and summaries
- Quality gates include deployment validation before execution
Technical Details
- Tests Added: 65 tests across 5 test files
- Code Added: ~2,049 lines (deployment_executor.py, environment_validator.py, enhanced templates)
- Validation Types: 7 comprehensive environment checks
- Focus: Production deployment safety and automation
Reference
See ROADMAP.md Phase 5.6 for complete details.
[0.5.5] - 2025-10-15
Added
- Integration testing framework with comprehensive validation
- Enhanced integration test work item type with multi-component dependency tracking
- Integration test execution framework with Docker Compose orchestration
- Performance benchmarking system with regression detection (10% threshold)
- API contract validation with breaking change detection
- Integration quality gates with environment validation
- Integration documentation requirements (architecture diagrams, sequence diagrams, API contracts)
Changed
- Enhanced
integration_test_spec.mdtemplate with test scenarios, performance benchmarks - Session workflow includes integration-specific briefings and summaries
- Quality gates validate integration test environment before execution
Technical Details
- Tests Added: 178 tests across 7 test files
- Code Added: ~5,458 lines (integration_test_runner.py, performance_benchmark.py, api_contract_validator.py)
- Performance Tracking: Latency percentiles (p50, p75, p90, p95, p99), throughput, response time
- Focus: Multi-service integration validation and performance regression detection
Reference
See ROADMAP.md Phase 5.5 for complete details.
[0.5] - 2025-10-14
Added
- Quality gates system for automated quality enforcement at session completion
- Test execution with coverage parsing and multi-language support (Python, JavaScript, TypeScript)
- Security scanning integration (bandit, safety, npm audit) with severity-based filtering
- Linting and formatting with auto-fix modes (ruff, eslint, prettier)
- Documentation validation (CHANGELOG, docstrings, README)
- Context7 MCP integration (stub ready for production)
- Custom validation rules (per-work-item and project-level)
- Quality gate reporting with remediation guidance
Changed
- Session completion now enforces quality standards before allowing completion
- Extracted quality gate logic into dedicated
quality_gates.pymodule (770 lines) - Added quality gates configuration to
.session/config.jsonduring/init
Fixed
- pytest exit code 5 ("no tests collected") now treated as skipped, not failed
- Auto-fix modes for linting and formatting improve developer experience
Technical Details
- Tests Added: 54 tests across all quality gate types
- Code Added: 875 lines (quality_gates.py, config integration)
- Tools Supported: pytest, ruff, bandit, safety, eslint, prettier, npm audit
- Configuration: Required vs optional gate enforcement
Reference
See ROADMAP.md Phase 5 for complete details.
[0.4] - 2025-10-14
Added
- Learning capture and curation system for knowledge management
- 4 learning commands:
/learn,/learn-show,/learn-search,/learn-curate - Auto-categorization into 6 categories (architecture_patterns, gotchas, best_practices, technical_debt, performance_insights, security)
- Similarity detection using Jaccard (0.6) and containment (0.8) thresholds
- Automatic duplicate detection and merging
- Multi-source learning extraction (session summaries, git commits with
LEARNING:, inline# LEARNING:comments) - Enhanced browsing with filters (category, tags, date range, session number)
- Statistics dashboard and timeline view
- Auto-curation trigger every N sessions (default 5, configurable)
Changed
- Sessions now include automated learning capture at completion
.session/config.jsonincludes learning configuration (auto_curate_frequency, similarity_threshold)
Technical Details
- Tests Added: 53 tests across all learning features
- Code Added: ~1,587 lines (commands, documentation, integration)
- Documentation:
docs/reference/learning-system.mdguide (550 lines) - Categories: 6 comprehensive categories covering software development learnings
Reference
See ROADMAP.md Phase 4 for complete details.
[0.3] - 2025-10-13
Added
- Work item dependency graph visualization with critical path analysis
/work-graphcommand with multiple output formats (ASCII, DOT, SVG)- Graph filtering options (status, milestone, type, focus node, include-completed)
- Critical path analysis with automatic highlighting in all formats
- Bottleneck detection (items blocking 2+ others)
- Graph statistics (total items, completion percentage, critical path length)
- Neighborhood view with
--focusfor exploring specific work items
Changed
- Enhanced
dependency_graph.pywith 313 new lines for CLI integration - Graph visualization updates automatically when work items change
Technical Details
- Tests Added: 36 tests across 6 sections
- Code Added: 426 lines (command integration, enhanced graph features)
- Formats: ASCII (terminal-friendly), DOT (Graphviz), SVG (documentation)
- Focus: Understanding project structure and identifying bottlenecks
Reference
See ROADMAP.md Phase 3 for complete details.
[0.2] - 2025-10-13
Added
- Work item management system with full CRUD operations
- 6 work item types (feature, bug, refactor, security, integration_test, deployment)
- 5 work item commands:
/work-new,/work-list,/work-show,/work-update,/work-next - Dependency tracking and resolution
- Priority levels (critical, high, medium, low) with visual indicators (🔴🟠🟡🟢)
- Milestone organization and progress tracking
- Status tracking (backlog, in_progress, completed, blocked)
- Conversational interface for work item creation (Claude Code compatible)
Changed
- Sessions now include comprehensive work item tracking
- Briefings include milestone context and dependency status
/statuscommand shows work item context and progress
Technical Details
- Tests Added: 9 tests for work item management
- Code Added:
work_item_manager.py(500+ lines) - CLI Commands: Non-interactive mode for Claude Code compatibility
- Storage: JSON-based work item tracking in
.session/work_items.json
Reference
See ROADMAP.md Phase 2 for complete details.
[0.1] - 2025-10-13
Added
- Core session management framework with complete workflow
/initcommand for project initialization- Stack tracking system (
generate_stack.py) with technology detection - Tree tracking system (
generate_tree.py) with structure change detection - Git workflow integration (
git_integration.py) with branch management - Enhanced
/startwith comprehensive context loading (docs, stack, tree, git) - Enhanced
/endwith tracking updates and quality gates /validatecommand for pre-flight checks before session completion- Multi-session work item support (resume on same branch)
Changed
- Session initialization creates
.session/directory structure - Briefings include full project context (vision, architecture, stack, tree)
- Session completion updates all tracking files automatically
Technical Details
- Tests Added: 6 core tests
- Code Added: 2,174 lines across 12 scripts
- Infrastructure:
.session/directory with tracking files - Git Integration: Automatic branch creation, commit, push, merge
Reference
See ROADMAP.md Phase 1 for complete details.
[0.0] - 2025-10-10
Added
- Foundation and documentation for Session-Driven Development methodology
- Repository structure with
.claude/commands/directory (16 slash commands) - Basic briefing generation (
briefing_generator.py) - Basic session completion (
session_complete.py) - Learning curation system (
learning_curator.py) - complete and production-ready - Dependency graph visualization (
dependency_graph.py) - complete and production-ready - File operation utilities (
file_ops.py) - Comprehensive methodology documentation (
docs/solokit-methodology.md) - Implementation insights documentation (
docs/implementation-insights.md) - AI-augmented framework reference (
docs/ai-augmented-solo-framework.md)
Technical Details
- Work Item Schema: Defined in
templates/work_items.json - Learning Schema: Defined in
templates/learnings.json - Algorithms: Dependency resolution (DFS-based), Learning categorization (keyword-based), Similarity detection (Jaccard + containment)
Reference
See ROADMAP.md Phase 0 for complete details.
Version Numbering
Versions follow semantic versioning (MAJOR.MINOR.PATCH):
- MAJOR: Incompatible API changes
- MINOR: New functionality (backward compatible)
- PATCH: Bug fixes (backward compatible)
Phase mapping to public release versions:
- Phases 0-5.9 (Development phases) → v0.1.0 (Initial Public Release)
- Phase 0: Foundation & documentation
- Phase 1: Core session workflow
- Phase 2: Work item system
- Phase 3: Dependency graphs
- Phase 4: Learning management
- Phase 5: Quality gates
- Phase 5.5: Integration testing
- Phase 5.6: Deployment support
- Phase 5.7: Spec-first architecture
- Phase 5.8: Marketplace plugin support
- Phase 5.9: Standard Python src/ layout & PyPI publishing
- v0.3.0 = Current release ✅ Current (Minimal init mode, bug fixes)
- v0.2.2 = Previous release (Fix missing test_execution.commands config)
- v0.2.1 = Earlier release (Critical CVE patches for Next.js/React templates)
- v0.2.0 = Earlier release (Tailwind CSS v4 migration, CHANGELOG check fixes)
- v0.1.7 = Earlier release (Improved /end command flow, slash command format)
- v0.1.6 = Earlier release (Minimal scaffolding migration complete)
- v0.1.5 = Earlier release (GitHub setup integration, security fixes)
- v0.1.4 = Earlier release (Test coverage improvements)
- v0.1.3 = Earlier release (Documentation model improvements)
- v0.1.1 = Earlier release (UX improvements & bug fixes)
- v1.0.0 = Stable API release (planned)
Links
- Roadmap - Detailed development history and technical implementation
- Contributing - How to contribute (if available)
- Documentation - Full documentation index
- Solokit Methodology - Complete methodology specification