Solokit Workflow Enhancements
January 16, 2026 ยท View on GitHub
This document tracks identified workflow improvements to make Solokit more user-friendly and automated.
Status Legend
- ๐ต IDENTIFIED - Enhancement identified, not yet implemented
- ๐ก IN_PROGRESS - Currently being worked on
- โ IMPLEMENTED - Completed and merged
Completed Enhancements
All core workflow enhancements have been implemented:
- Enhancement #1: Auto Git Initialization in
sk initโ โ IMPLEMENTED - Enhancement #2: CHANGELOG Update Workflow โ โ IMPLEMENTED
- Enhancement #3: Pre-flight Commit Check in
/sk:endโ โ IMPLEMENTED - Enhancement #4: Add OS-Specific Files to Initial .gitignore โ โ IMPLEMENTED
- Enhancement #5: Create Initial Commit on Main During sk init โ โ IMPLEMENTED
- Enhancement #6: Work Item Completion Status Control โ โ IMPLEMENTED (Session 11)
- Enhancement #7: Phase 1 - Documentation Reorganization & Project Files โ โ IMPLEMENTED (Session 8)
- Enhancement #8: Phase 2 - Test Suite Reorganization โ โ IMPLEMENTED (Session 9, 1,401 tests, 85% coverage)
- Enhancement #9: Phase 3 - Complete Phase 5.9 (src/ Layout Transition) โ โ IMPLEMENTED (Session 12)
- Enhancement #10: Add Work Item Deletion Command โ โ IMPLEMENTED (Session 13, PR #90)
- Enhancement #11: Enhanced Session Briefings with Context Continuity โ โ IMPLEMENTED (Session 14)
- Enhancement #12: Change
/sk:endDefault to Complete โ โ IMPLEMENTED (Session 30) - Enhancement #13: Interactive UI Integration โ โ IMPLEMENTED (Session 31)
- Enhancement #14: Template-Based Project Initialization โ โ IMPLEMENTED (Session 32)
Enhancement #15: Session Briefing Optimization
Status: ๐ต IDENTIFIED
Problem:
Session briefings currently consume significant context window space and may not provide the most useful information for AI-assisted development. Potential issues include:
- Excessive context usage: Briefings may include redundant or less relevant information
- Missing critical context: Important architectural constraints or patterns might be omitted
- Inefficient information structure: Data not organized for optimal AI consumption
- Lack of progressive disclosure: All information presented upfront rather than contextually
Proposed Solution:
Research and optimize session briefing content and structure to:
- Maximize information value: Include only the most relevant and actionable information
- Minimize context usage: Compress or restructure data for efficiency
- Improve AI effectiveness: Format information in ways that improve AI understanding
- Context-aware loading: Load additional detail on-demand rather than upfront
Implementation:
To be researched and determined during implementation. May include:
- Analysis of current briefing content and usage patterns
- Identification of high-value vs low-value information
- Experimentation with different information structures
- Compression techniques for historical data
- Dynamic context loading strategies
Files Affected:
Modified:
src/solokit/session/briefing.py- Session briefing generationsrc/solokit/session/briefing/- Briefing module components.claude/commands/start.md- Start command documentation- Briefing templates and data structures
Benefits:
- More context available: Reduced briefing size leaves more room for code and docs
- Better AI assistance: Higher quality information improves AI effectiveness
- Faster sessions: Less time loading and processing briefing data
- Improved focus: Only relevant information presented
Priority: High - Impacts every session and all future work
Notes:
- Details to be researched and refined during implementation
- May involve experimentation and iteration to find optimal approach
- Should measure context usage before and after optimization
Enhancement #16: Pre-Merge Security Gates
Status: ๐ต IDENTIFIED
Problem:
Currently, security scans run at /sk:end, but if they fail, code might already be committed to the branch. There's no enforcement mechanism to prevent merging insecure code to main. Critical security issues include:
- Secret exposure: API keys, passwords, tokens accidentally committed
- Known vulnerabilities: Dependencies with CVEs in production
- Code vulnerabilities: SQL injection, XSS, insecure authentication patterns
- Supply chain attacks: Malicious packages in dependencies
- License compliance: Incompatible licenses that create legal risk
Current Workflow Gap:
Code written โ /sk:end โ Security scan (may fail) โ Commit to branch โ Merge to main
โ No gate here
Proposed Solution:
Implement mandatory pre-merge security gates that prevent merging to main if critical security issues exist:
-
Secret Scanning
- Scan for API keys, tokens, passwords, private keys
- Tools: GitGuardian, TruffleHog, detect-secrets
- Block merge if secrets detected
-
Static Application Security Testing (SAST)
- Analyze code for security vulnerabilities
- Check for SQL injection, XSS, insecure crypto, etc.
- Tools: Bandit (Python), ESLint security plugins (JS/TS), Semgrep
- Block merge if critical/high vulnerabilities found
-
Dependency Vulnerability Scanning
- Check for known CVEs in dependencies
- Tools: Safety (Python), npm audit (JS/TS), Snyk
- Block merge if critical CVEs exist
-
Supply Chain Security
- Detect malicious or compromised packages
- Verify package signatures and checksums
- Tools: Sigstore, Socket Security
-
License Compliance
- Ensure dependencies use compatible licenses
- Flag GPL in proprietary projects, etc.
- Tools: license-checker, FOSSA
Implementation:
Pre-merge hook (Git or CI/CD):
# .git/hooks/pre-push or CI workflow
solokit security-scan --pre-merge
โ Runs all security checks
โ Exits with code 1 if critical issues found
โ Blocks push/merge
Quality gate integration:
# Note: This file will be created during implementation
# src/solokit/quality/security_gates.py
def run_pre_merge_security_gates():
results = {
"secret_scan": run_secret_scanning(),
"sast": run_static_analysis(),
"dependencies": scan_dependencies(),
"supply_chain": check_supply_chain(),
"licenses": check_license_compliance()
}
return all_passed, results
Files Affected:
New:
src/solokit/security/secret_scanner.py- Secret detection (will be created)src/solokit/security/sast_scanner.py- Static analysis (will be created)src/solokit/security/dependency_scanner.py- CVE checking (will be created)src/solokit/security/supply_chain_checker.py- Package verification (will be created)src/solokit/security/license_checker.py- License compliance (will be created)src/solokit/quality/security_gates.py- Pre-merge gate orchestration (will be created).git/hooks/pre-push- Git hook for local enforcement- Tests for all security modules
Modified:
src/solokit/session/complete.py- Integrate pre-merge security gates.session/config.json- Add security gate configuration- CI/CD workflows - Add security gate job
Benefits:
- Prevents secret leaks: Catches credentials before they reach remote
- Blocks vulnerable code: No critical security issues in production
- Supply chain protection: Detects malicious dependencies
- Compliance assurance: Legal risks from licenses caught early
- Developer awareness: Immediate feedback on security issues
- Audit trail: All security decisions documented
Priority: Critical - Security is foundational, must be enforced before anything reaches production
Enhancement #17: Continuous Security Monitoring
Status: ๐ต IDENTIFIED
Problem:
Security is currently checked only during development sessions. Between sessions, new vulnerabilities may be discovered (CVEs published), and the codebase remains unmonitored. This creates a security gap:
- Zero-day vulnerabilities: New CVEs published for existing dependencies
- Unmaintained dependencies: Libraries deprecated or abandoned
- Drift from security best practices: New security advisories not applied
- No proactive alerting: Developer only finds issues when starting next session
Current Gap:
Session 1: Security scan โ โ Time passes (days/weeks) โ New CVE published! โ No alert
โ Session 2: User unaware
Proposed Solution:
Implement continuous security monitoring that runs scheduled scans and alerts developers of new security issues:
-
Scheduled CVE Scanning
- Daily/weekly scans for new CVEs in dependencies
- Compare against CVE databases (NVD, GitHub Advisory)
- Generate alerts for critical/high severity issues
-
Dependency Update Monitoring
- Track security patches for dependencies
- Automatically create work items for critical updates
- Suggest safe update paths (minor vs major version changes)
-
Security Advisory Notifications
- Subscribe to security advisories for frameworks used
- Alert on new attack vectors or best practice changes
- Generate remediation work items
-
License Compliance Monitoring
- Track dependency license changes
- Alert on new incompatible licenses
- Monitor for license violations
Implementation:
Scheduled monitoring (GitHub Actions, cron):
# .github/workflows/security-monitoring.yml
name: Security Monitoring
on:
schedule:
- cron: '0 9 * * *' # Daily at 9 AM
jobs:
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run security monitoring
run: solokit security-monitor --create-work-items
Monitoring system:
# Note: This file will be created during implementation
# src/solokit/security/monitor.py
class SecurityMonitor:
def scan_for_new_cves(self):
# Check dependencies against CVE databases
def check_for_updates(self):
# Find security updates for dependencies
def create_security_work_items(self, findings):
# Auto-create work items for critical issues
def notify_developer(self, critical_issues):
# Email/Slack notification
Files Affected:
New:
src/solokit/security/monitor.py- Continuous monitoring system (will be created)src/solokit/security/cve_database.py- CVE lookup and caching (will be created)src/solokit/security/advisory_tracker.py- Security advisory tracking (will be created).github/workflows/security-monitoring.yml- Scheduled workflow (will be created)- Tests for monitoring system
Modified:
src/solokit/work_items/creator.py- Auto-create security work items.session/config.json- Add monitoring configurationsrc/solokit/notifications/- Alert mechanisms (email, Slack) (will be created)
Benefits:
- Proactive security: Find vulnerabilities before attackers
- Zero-day protection: Immediate alerts for new CVEs
- Reduced exposure window: Faster response to security issues
- Automated remediation: Work items auto-created
- Compliance: Continuous license monitoring
- Peace of mind: Always know security status
Priority: High - Continuous protection is essential for production systems
Enhancement #18: Test Quality Gates
Status: ๐ต IDENTIFIED
Problem:
Currently, tests are required but there's no validation of test quality. This allows:
- Weak tests: Tests that always pass regardless of code correctness
- Insufficient coverage: Critical paths untested
- Missing test types: No integration or E2E tests
- Performance regressions: No performance test baseline
- Flaky tests: Unreliable tests that randomly fail
Example of weak tests:
def test_user_authentication():
result = authenticate_user("user", "pass")
assert result is not None # โ Always passes even if auth is broken
Current Gap:
Tests written โ All tests pass โ โ Merge
โ But tests might be weak or incomplete
Proposed Solution:
Implement test quality gates that enforce test effectiveness:
-
Critical Path Coverage
- Identify critical paths (authentication, payment, data loss scenarios)
- Require >90% coverage for critical paths
- Tools: coverage.py with path analysis, Istanbul
-
Mutation Testing
- Inject bugs into code, ensure tests catch them
- Mutation score must meet threshold (e.g., >75%)
- Tools: Stryker (JS/TS), mutmut (Python)
-
Integration Test Requirements
- Require integration tests for multi-component features
- Validate data flow across components
- Minimum number of integration tests per work item type
-
E2E Test Requirements (for web apps)
- Require E2E tests for user-facing features
- Validate complete user workflows
- Tools: Playwright, Cypress, Selenium
-
Performance Regression Tests
- Establish performance baselines
- Fail if performance degrades beyond threshold (e.g., >10% slower)
- Track response times, throughput, resource usage
-
Test Reliability (Flakiness Detection)
- Detect flaky tests (inconsistent pass/fail)
- Quarantine flaky tests
- Require fixing before merge
Implementation:
Test quality gate:
# Note: This file will be created during implementation
# src/solokit/quality/test_quality_gates.py
class TestQualityGates:
def check_critical_path_coverage(self, work_item):
# Verify critical paths have >90% coverage
def run_mutation_testing(self):
# Run mutation tests, check score
def validate_integration_tests(self, work_item):
# Ensure integration tests exist
def check_e2e_tests(self, work_item):
# For UI work items, verify E2E tests
def check_performance_regression(self):
# Compare against baseline
Work item spec integration:
## Testing Requirements
**Critical Paths:** (auto-detected or specified)
- User authentication flow
- Payment processing
- Data backup/restore
**Required Test Types:**
- [x] Unit tests (>85% coverage)
- [x] Integration tests (โฅ3 scenarios)
- [x] E2E tests (main user workflow)
- [x] Performance tests (baseline established)
**Mutation Score Target:** 75%
Files Affected:
New:
src/solokit/quality/test_quality_gates.py- Test quality validation (will be created)src/solokit/testing/mutation_runner.py- Mutation testing integration (will be created)src/solokit/testing/critical_path_analyzer.py- Critical path identification (will be created)src/solokit/testing/flakiness_detector.py- Flaky test detection (will be created)src/solokit/testing/performance_baseline.py- Performance tracking (will be created)- Tests for all test quality modules
Modified:
src/solokit/session/complete.py- Add test quality gatessrc/solokit/session/validate.py- Add validation checkssrc/solokit/work_items/spec_parser.py- Parse testing requirements.session/config.json- Test quality thresholds
Benefits:
- Confidence in tests: Know tests actually catch bugs
- Prevents regressions: Performance baselines protect against degradation
- Complete coverage: All test types required
- Reliable builds: No flaky tests breaking CI
- Quality assurance: Tests verified to be effective
Priority: High - Quality tests are essential for reliable software
Enhancement #19: Advanced Code Quality Gates
Status: ๐ต IDENTIFIED
Problem:
Current linting only catches basic style issues. Complex code quality problems go undetected:
- High complexity: Functions with cyclomatic complexity >10 are hard to maintain
- Code duplication: Copy-pasted code creates maintenance burden
- Dead code: Unused functions and imports waste space and create confusion
- Weak typing: TypeScript without strict mode allows bugs
Example:
def process_order(order): # Complexity: 23 โ Too complex
if order.status == "pending":
if order.payment_method == "card":
if order.card_valid:
if order.inventory_available:
# ... 50 more lines of nested ifs
Current Gap:
Linting passes โ โ Merge
โ Complex, duplicated, dead code still merges
Proposed Solution:
Implement advanced code quality gates that enforce maintainability:
-
Cyclomatic Complexity Enforcement
- Fail if function complexity >10
- Suggest breaking down complex functions
- Tools: radon (Python), complexity-report (JS/TS)
-
Code Duplication Detection
- Detect copy-pasted code blocks
- Fail if duplication >5% of codebase
- Tools: jscpd, pylint duplicate-code
-
Dead Code Detection
- Find unused functions, classes, imports
- Require removal before merge
- Tools: vulture (Python), ts-prune (TypeScript)
-
Type Coverage Enforcement (TypeScript)
- Require strict mode in tsconfig.json
- Fail if
anytypes used without justification - Measure type coverage percentage
-
Cognitive Complexity
- Measure how hard code is to understand
- Complement cyclomatic complexity
- Tools: SonarQube, CodeClimate
-
Code Documentation Standards
- Enforce documentation for public APIs, classes, and functions
- Validate docstring/JSDoc completeness and quality
- Require parameter and return type documentation
- Check for outdated documentation (code changes without doc updates)
- Generate missing documentation warnings
- Tools: pydocstyle, JSDoc validation, custom documentation linters
Implementation:
Code quality gate:
# Note: This file will be created during implementation
# src/solokit/quality/code_quality_gates.py
class CodeQualityGates:
def check_complexity(self, file_changes):
# Analyze cyclomatic complexity
# Fail if any function >10
def detect_duplication(self):
# Scan for code duplication
# Fail if >5% duplicated
def find_dead_code(self):
# Detect unused code
# Report for removal
def check_type_coverage(self):
# For TypeScript: verify strict mode
# Check for excessive `any` usage
def validate_documentation(self, file_changes):
# Check for missing docstrings/JSDoc
# Validate documentation completeness
# Detect outdated documentation
# Generate documentation warnings
Configuration:
// .session/config.json
"code_quality_gates": {
"complexity": {
"enabled": true,
"max_complexity": 10,
"max_cognitive_complexity": 15
},
"duplication": {
"enabled": true,
"max_percentage": 5,
"min_tokens": 50
},
"dead_code": {
"enabled": true,
"fail_on_unused": true
},
"type_coverage": {
"enabled": true,
"require_strict_mode": true,
"max_any_percentage": 2
},
"documentation": {
"enabled": true,
"require_public_api_docs": true,
"require_param_docs": true,
"require_return_docs": true,
"min_docstring_length": 20,
"check_outdated_docs": true
}
}
Files Affected:
New:
src/solokit/quality/code_quality_gates.py- Code quality validation (will be created)src/solokit/analysis/complexity_analyzer.py- Complexity calculation (will be created)src/solokit/analysis/duplication_detector.py- Duplication detection (will be created)src/solokit/analysis/dead_code_finder.py- Dead code detection (will be created)src/solokit/analysis/type_coverage.py- TypeScript type coverage (will be created)src/solokit/analysis/documentation_validator.py- Code documentation validation (will be created)- Tests for all analysis modules
Modified:
src/solokit/session/complete.py- Add code quality gatessrc/solokit/session/validate.py- Add validation checks.session/config.json- Code quality thresholds
Benefits:
- Maintainable code: Low complexity = easy to understand
- DRY principle: No code duplication
- Clean codebase: No dead code clutter
- Type safety: Strong typing prevents bugs
- Technical debt prevention: Quality enforced continuously
- Better documentation: Code is self-documenting and easier to understand
- Onboarding efficiency: New developers can understand code faster with proper documentation
Priority: Medium-High - Prevents technical debt accumulation
Enhancement #20: Production Readiness Gates
Status: ๐ต IDENTIFIED
Problem:
Code may pass all tests but still not be ready for production. Production-specific requirements are not validated:
- No health checks: Can't monitor service health
- No observability: Can't debug production issues
- No error tracking: Errors silently fail
- Inconsistent logging: Can't trace requests
- Unsafe migrations: Database changes cause downtime
Example of production failure:
All tests pass โ โ Deploy to production โ Service starts
โ Health check missing โ
โ Load balancer can't detect failures
โ Site down, no alerts
Proposed Solution:
Implement production readiness gates that validate operational requirements:
-
Health Check Endpoints
- Require
/healthand/readyendpoints - Validate they return proper status codes
- Test health check logic actually works
- Require
-
Metrics and Observability
- Require
/metricsendpoint (Prometheus format) - Validate metrics exported (request count, latency, errors)
- Ensure distributed tracing configured (OpenTelemetry)
- Require
-
Error Tracking Integration
- Require error tracking setup (Sentry, Rollbar, etc.)
- Validate errors are captured and reported
- Test error grouping and notification
-
Structured Logging
- Enforce structured logging (JSON format)
- Require correlation IDs for request tracing
- Validate log levels appropriate
-
Database Migration Safety
- Require reversible migrations
- Test migrations on staging data
- Validate migration doesn't cause downtime
-
Configuration Management
- All config via environment variables
- No secrets in code or version control
- Validate required env vars documented
Implementation:
Production readiness gate:
# src/solokit/quality/production_gates.py
class ProductionReadinessGates:
def validate_health_endpoints(self):
# Check /health and /ready exist
# Test they return 200
def validate_metrics(self):
# Check /metrics endpoint
# Validate metrics format
def validate_error_tracking(self):
# Verify error tracking configured
# Test error capture works
def validate_logging(self):
# Check structured logging
# Verify correlation IDs
def validate_migrations(self):
# Test migrations reversible
# Validate no downtime
Work item checklist for deployment:
## Production Readiness
- [x] Health check endpoints implemented and tested
- [x] Metrics exported (Prometheus format)
- [x] Error tracking configured and tested
- [x] Structured logging with correlation IDs
- [x] Database migrations tested and reversible
- [x] Required environment variables documented
- [x] Secrets managed via secrets manager (not in code)
Files Affected:
New:
src/solokit/quality/production_gates.py- Production readiness validation (will be created)src/solokit/production/health_check_validator.py- Health check testing (will be created)src/solokit/production/metrics_validator.py- Metrics validation (will be created)src/solokit/production/migration_validator.py- Migration safety checks (will be created)- Tests for production validation
Modified:
src/solokit/session/complete.py- Add production gates for deployment work itemssrc/solokit/templates/deployment.md- Add production checklist.session/config.json- Production requirements configuration
Benefits:
- Operational visibility: Always know service health
- Faster debugging: Logs and traces available
- Proactive alerting: Errors tracked and reported
- Safe deployments: Migrations tested and reversible
- Production confidence: All operational needs met
Priority: High - Essential for production deployments
Enhancement #21: Deployment Safety Gates
Status: ๐ต IDENTIFIED
Problem:
Deployments can fail or cause outages even with good code:
- Untested deployments: Deployment procedure never practiced
- Breaking changes: API changes break clients
- No rollback plan: Can't revert if deployment fails
- Risky releases: All changes deployed at once
Example of deployment failure:
Code ready โ Deploy to production โ API change breaks mobile app
โ No rollback procedure โ
โ Site down for hours
Proposed Solution:
Implement deployment safety gates that validate deployment readiness:
-
Deployment Dry-Run
- Test deployment procedure in staging
- Validate all deployment steps work
- Ensure no manual steps required
-
Breaking Change Detection
- Detect API changes (endpoints removed, fields changed)
- Validate backward compatibility
- Require versioning for breaking changes
- Tools: OpenAPI diff, GraphQL schema comparison
-
Rollback Testing
- Test rollback procedure before deployment
- Validate rollback completes successfully
- Document rollback steps
-
Canary Deployment Support
- Gradual rollout to small percentage of users
- Monitor metrics during rollout
- Automatic rollback if errors spike
-
Smoke Tests
- Run smoke tests after deployment
- Validate critical paths work
- Automatic rollback if smoke tests fail
Implementation:
Deployment safety gate:
# src/solokit/deployment/safety_gates.py
class DeploymentSafetyGates:
def run_dry_run(self, deployment_config):
# Test deployment in staging
def detect_breaking_changes(self):
# Compare API schemas
# Detect breaking changes
def test_rollback(self):
# Execute rollback procedure
# Validate success
def setup_canary(self, percentage):
# Configure canary deployment
# Set up monitoring
def run_smoke_tests(self):
# Execute smoke tests
# Return results
Deployment workflow:
# .github/workflows/deploy.yml
jobs:
pre-deployment:
- Dry-run in staging
- Detect breaking changes
- Test rollback procedure
deploy:
- Canary to 5% of users
- Monitor for 10 minutes
- If metrics good, continue
- If errors spike, rollback
- Gradually increase to 100%
post-deployment:
- Run smoke tests
- Verify health checks
- Monitor for issues
Files Affected:
New:
src/solokit/deployment/safety_gates.py- Deployment validation (will be created)src/solokit/deployment/dry_run.py- Dry-run execution (will be created)src/solokit/deployment/breaking_change_detector.py- API diff analysis (will be created)src/solokit/deployment/rollback_tester.py- Rollback validation (will be created)src/solokit/deployment/canary.py- Canary deployment orchestration (will be created)src/solokit/deployment/smoke_tests.py- Smoke test runner (will be created)- Tests for deployment safety
Modified:
- CI/CD workflows - Add deployment gates
src/solokit/templates/deployment.md- Add safety checklist.session/config.json- Deployment safety configuration
Benefits:
- Safe deployments: Tested before production
- No breaking changes: Backward compatibility validated
- Quick recovery: Rollback always available
- Gradual rollouts: Canary reduces risk
- Confidence: Deploy without fear
Priority: High - Essential for production stability
Enhancement #22: Disaster Recovery & Backup Automation
Status: ๐ต IDENTIFIED
Problem:
Production systems lack comprehensive disaster recovery and backup automation:
- No automated backups: Critical data loss risk if manual backups forgotten
- Untested recovery procedures: Backups may be corrupt or incomplete
- No disaster recovery plan: No documented procedure for system restoration
- No data retention policies: Old backups accumulate or critical data deleted too soon
- Single point of failure: No geographic redundancy or failover capability
Example of failure:
Production running โ Database corruption โ
โ No recent backup
โ Or backup exists but restore untested
โ Or backup incomplete (missing files/secrets)
โ Hours/days of data loss
โ Extended downtime
Proposed Solution:
Implement comprehensive disaster recovery and backup automation system:
-
Automated Backup Strategy
- Automated database backups (full, incremental, differential)
- Automated file system backups
- Automated configuration and secrets backup (encrypted)
- Automated infrastructure state backup (IaC state files)
- Customizable backup schedules (hourly, daily, weekly)
-
Backup Verification & Testing
- Automated backup integrity checks (checksum validation)
- Automated restore testing in isolated environment
- Backup completeness validation (all critical data included)
- Corruption detection and alerting
- Test restore performance benchmarks
-
Disaster Recovery Planning
- Automated DR plan generation based on system architecture
- Recovery Time Objective (RTO) and Recovery Point Objective (RPO) tracking
- Step-by-step recovery procedures (runbooks)
- Automated failover procedures for critical services
- Business continuity documentation
-
Data Retention & Lifecycle Management
- Configurable retention policies (7 days, 30 days, 1 year, etc.)
- Automated old backup cleanup
- Compliance with data retention regulations
- Backup versioning and point-in-time recovery
- Archive to cold storage for long-term retention
-
Geographic Redundancy
- Multi-region backup replication
- Automated cross-region failover testing
- Geo-distributed backup storage
- Regional disaster scenario testing
-
Recovery Procedures
- One-command full system restore
- Selective data recovery (specific tables, files, configs)
- Point-in-time recovery (restore to specific timestamp)
- Dry-run recovery testing (test without affecting production)
- Recovery progress monitoring and ETA
Implementation:
Backup orchestrator:
# src/solokit/disaster_recovery/backup_manager.py
class BackupManager:
def schedule_backups(self, config):
# Schedule automated backups based on config
# - Database backups
# - File system backups
# - Configuration backups
# - Infrastructure state backups
def verify_backup(self, backup_id):
# Verify backup integrity
# - Checksum validation
# - Completeness check
# - Size validation
def test_restore(self, backup_id, test_env):
# Test restore in isolated environment
# - Spin up test environment
# - Restore backup
# - Validate data integrity
# - Measure restore time
# - Cleanup test environment
Disaster recovery planner:
# src/solokit/disaster_recovery/dr_planner.py
class DisasterRecoveryPlanner:
def generate_dr_plan(self, architecture):
# Analyze system architecture
# Generate disaster recovery plan
# - Identify critical components
# - Define recovery priorities
# - Create recovery procedures
# - Calculate RTO/RPO
def validate_dr_plan(self):
# Test disaster recovery plan
# - Simulate disaster scenarios
# - Execute recovery procedures
# - Measure recovery time
# - Identify gaps and improvements
Recovery executor:
# src/solokit/disaster_recovery/recovery_executor.py
class RecoveryExecutor:
def full_system_restore(self, backup_id, target_env):
# Restore entire system from backup
# - Restore infrastructure
# - Restore database
# - Restore file system
# - Restore configurations
# - Verify system health
def selective_restore(self, backup_id, resources):
# Restore specific resources
# - Specific database tables
# - Specific files
# - Specific configurations
def point_in_time_restore(self, timestamp, target_env):
# Restore system to specific point in time
# - Find appropriate backups
# - Restore and replay logs
# - Verify data consistency
Backup configuration:
# .session/config.json or .solokit/backup_config.yml
backup_config:
schedule:
database:
full: "0 2 * * 0" # Weekly full backup (Sunday 2 AM)
incremental: "0 2 * * 1-6" # Daily incremental
continuous: true # Continuous log shipping
filesystem:
frequency: "0 3 * * *" # Daily at 3 AM
exclude_patterns:
- "node_modules/"
- "*.log"
- ".git/"
infrastructure:
frequency: "0 4 * * *" # Daily at 4 AM
include:
- terraform_state
- kubernetes_manifests
- ci_cd_configs
retention:
short_term: 7 # 7 days
medium_term: 30 # 30 days
long_term: 365 # 1 year
archive_after: 90 # Move to cold storage after 90 days
verification:
integrity_check: true # Always verify checksums
test_restore_frequency: "0 5 * * 0" # Weekly restore test
test_environment: "dr-test"
storage:
primary_region: "us-east-1"
replica_regions:
- "us-west-2"
- "eu-west-1"
encryption: "AES-256"
recovery_objectives:
rto: "1h" # Recovery Time Objective
rpo: "15m" # Recovery Point Objective (max data loss)
notifications:
backup_failures: ["email", "slack"]
verification_failures: ["email", "pagerduty"]
recovery_tests: ["email"]
DR plan template:
# Disaster Recovery Plan
## Recovery Objectives
- **RTO (Recovery Time Objective)**: 1 hour
- **RPO (Recovery Point Objective)**: 15 minutes
## Critical Components (Priority Order)
1. Database (PostgreSQL)
2. Application servers
3. File storage
4. Cache layer
5. Background workers
## Disaster Scenarios
### Scenario 1: Database Corruption
**Detection**: Health checks fail, query errors
**Recovery Procedure**:
1. Stop application servers (prevent further corruption)
2. Identify last known good backup
3. Restore database from backup: `solokit dr restore-database --backup-id <id>`
4. Replay transaction logs from backup point to current
5. Verify data integrity: `solokit dr verify-database`
6. Restart application servers
7. Monitor for errors
**Estimated Recovery Time**: 30 minutes
### Scenario 2: Complete Infrastructure Loss
**Detection**: All services unreachable
**Recovery Procedure**:
1. Activate secondary region: `solokit dr activate-failover --region us-west-2`
2. Restore infrastructure: `solokit dr restore-infrastructure --backup-id <id>`
3. Restore database: `solokit dr restore-database --backup-id <id> --region us-west-2`
4. Update DNS to point to new region
5. Verify all services operational
6. Notify stakeholders
**Estimated Recovery Time**: 1 hour
### Scenario 3: Data Deletion (Human Error)
**Detection**: Reports of missing data
**Recovery Procedure**:
1. Identify deletion timestamp
2. Point-in-time restore: `solokit dr restore-point-in-time --timestamp "2025-10-29T10:30:00Z"`
3. Extract affected data
4. Merge recovered data into production
5. Verify data integrity
**Estimated Recovery Time**: 20 minutes
Commands:
# Configure backup system
/sk:dr-init
# View backup status
/sk:dr-status
# Test disaster recovery plan
/sk:dr-test [--scenario SCENARIO]
# Restore from backup
/sk:dr-restore [--backup-id ID] [--point-in-time TIMESTAMP]
# Verify backups
/sk:dr-verify-backups
# Generate DR plan
/sk:dr-plan-generate
Files Affected:
New:
src/solokit/disaster_recovery/backup_manager.py- Backup orchestration (will be created)src/solokit/disaster_recovery/dr_planner.py- DR plan generation (will be created)src/solokit/disaster_recovery/recovery_executor.py- Recovery execution (will be created)src/solokit/disaster_recovery/backup_verifier.py- Backup verification (will be created)src/solokit/disaster_recovery/retention_manager.py- Data lifecycle management (will be created).claude/commands/dr-init.md- DR initialization command (will be created).claude/commands/dr-status.md- DR status command (will be created).claude/commands/dr-test.md- DR testing command (will be created).claude/commands/dr-restore.md- Recovery command (will be created)docs/disaster_recovery_plan.md- Generated DR plan (will be created).solokit/backup_config.yml- Backup configuration (will be created)- Tests for DR system
Modified:
src/solokit/project/init.py- Add DR setup to project initialization.session/config.json- Add DR configuration section- CI/CD workflows - Add backup verification jobs
Benefits:
- Data protection: Automated backups prevent data loss
- Business continuity: Quick recovery from disasters
- Tested recovery: Regular restore testing ensures backups work
- Compliance: Meet data retention and backup requirements
- Peace of mind: Know you can recover from any disaster
- Geographic redundancy: Protected against regional failures
- Documented procedures: Clear recovery steps for any scenario
- Minimal downtime: Fast recovery meets RTO/RPO objectives
Priority: Critical - Data loss and prolonged outages can be catastrophic
Notes:
- Backup storage costs should be factored into project budget
- Recovery testing should be scheduled during low-traffic periods
- DR plan should be reviewed and updated quarterly
- Encryption keys and secrets must be backed up separately and securely
- Team should be trained on recovery procedures
Enhancement #23: JSON Schema Spec Validation
Status: ๐ต IDENTIFIED
Problem:
Current spec validation only checks for section presence (completeness checking). This creates issues:
- No structure validation: Sections may be present but empty or malformed
- No type checking: Cannot enforce that "Priority" is one of [critical, high, medium, low]
- Late error detection: Structural issues found during
/startor/end - Poor error messages: "Section missing" but not "Section has wrong format"
- No schema evolution: Cannot version spec formats or migrate old specs
Example of current validation gap:
# Current spec file - PASSES current validation
## Acceptance Criteria
(empty section)
## Implementation Details
not a list, just a paragraph
## Priority
Super Urgent!!! โ Not a valid priority value
Current validator: โ All sections present, validation passes
Desired validator: โ Multiple schema violations detected
Proposed Solution:
Implement JSON Schema-based spec validation for rigorous spec structure checking:
-
Schema Definitions
- Define JSON Schema for each work item type
- Validate section structure, content types, enums
- Support required vs optional fields
- Version schemas for backward compatibility
-
Markdown-to-Structure Parser
- Parse markdown specs into structured data
- Extract sections, lists, metadata
- Convert to JSON for schema validation
- Preserve original markdown for human readability
-
Comprehensive Validation
- Schema validation (structure, types, enums)
- Cross-field validation (dependencies must exist)
- Custom business rules (time box reasonable for spike)
- Reference validation (links, file paths)
-
Better Error Messages
- Precise error location (line number, section)
- Explain what's wrong and how to fix
- Suggest corrections
- Examples of correct format
-
Schema Migration
- Detect spec version
- Migrate old specs to new schema
- Preserve content during migration
- Log migration actions
Implementation:
JSON Schema definitions:
// src/solokit/templates/schemas/feature_spec_schema.json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"$id": "https://solokit.dev/schemas/feature_spec.json",
"title": "Feature Spec",
"description": "Schema for feature work item specifications",
"type": "object",
"required": ["title", "type", "overview", "rationale", "acceptance_criteria"],
"properties": {
"title": {
"type": "string",
"minLength": 5,
"maxLength": 200,
"description": "Feature title"
},
"type": {
"type": "string",
"enum": ["feature"],
"description": "Work item type"
},
"priority": {
"type": "string",
"enum": ["critical", "high", "medium", "low"],
"default": "medium",
"description": "Feature priority"
},
"overview": {
"type": "string",
"minLength": 20,
"description": "Feature overview (what, why, for whom)"
},
"rationale": {
"type": "string",
"minLength": 20,
"description": "Why this feature is needed"
},
"acceptance_criteria": {
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"minLength": 5
},
"description": "List of acceptance criteria"
},
"implementation_details": {
"type": "object",
"properties": {
"approach": {"type": "string"},
"files_to_modify": {
"type": "array",
"items": {"type": "string"}
},
"new_files": {
"type": "array",
"items": {"type": "string"}
},
"dependencies": {
"type": "array",
"items": {"type": "string", "pattern": "^[a-z_]+$"}
}
}
},
"testing_strategy": {
"type": "object",
"properties": {
"unit_tests": {"type": "array", "items": {"type": "string"}},
"integration_tests": {"type": "array", "items": {"type": "string"}},
"test_coverage_target": {
"type": "integer",
"minimum": 0,
"maximum": 100,
"default": 80
}
}
},
"dependencies": {
"type": "array",
"items": {
"type": "string",
"pattern": "^(feature|bug|refactor|security|integration_test|deployment)_[a-z0-9_]+$"
},
"description": "Work item dependencies (must exist)"
}
}
}
Spec parser:
# src/solokit/work_items/spec_parser.py (enhanced)
import yaml
import json
import jsonschema
from pathlib import Path
from typing import Dict, Any, List
class SpecParser:
"""Parse and validate markdown specs against JSON schemas"""
def __init__(self):
self.schema_dir = Path("src/solokit/templates/schemas")
self.schemas = self.load_schemas()
def load_schemas(self) -> Dict[str, Dict]:
"""Load all JSON schemas"""
schemas = {}
for schema_file in self.schema_dir.glob("*_schema.json"):
work_item_type = schema_file.stem.replace("_spec_schema", "")
with open(schema_file) as f:
schemas[work_item_type] = json.load(f)
return schemas
def parse_spec_to_structure(self, spec_path: Path) -> Dict[str, Any]:
"""Parse markdown spec into structured data"""
content = spec_path.read_text()
# Extract frontmatter (YAML)
frontmatter = {}
if content.startswith("---"):
parts = content.split("---", 2)
if len(parts) >= 3:
frontmatter = yaml.safe_load(parts[1])
content = parts[2]
# Parse markdown sections
sections = self.parse_markdown_sections(content)
# Combine frontmatter and sections
structure = {**frontmatter, **sections}
return structure
def parse_markdown_sections(self, content: str) -> Dict[str, Any]:
"""Parse markdown sections into structured data"""
sections = {}
current_section = None
current_content = []
for line in content.split("\n"):
# Detect section headers
if line.startswith("##"):
# Save previous section
if current_section:
sections[current_section] = self.parse_section_content(
current_section, "\n".join(current_content)
)
# Start new section
current_section = line.replace("##", "").strip().lower().replace(" ", "_")
current_content = []
else:
current_content.append(line)
# Save last section
if current_section:
sections[current_section] = self.parse_section_content(
current_section, "\n".join(current_content)
)
return sections
def parse_section_content(self, section_name: str, content: str) -> Any:
"""Parse section content based on expected type"""
content = content.strip()
# List sections (bullet points)
if section_name in ["acceptance_criteria", "implementation_steps", "test_cases"]:
items = []
for line in content.split("\n"):
line = line.strip()
if line.startswith("-") or line.startswith("*"):
items.append(line[1:].strip())
elif line.startswith("- [ ]") or line.startswith("- [x]"):
items.append(line[5:].strip())
return items if items else content
# Object sections (key-value pairs)
elif section_name in ["implementation_details", "testing_strategy"]:
# Try to parse as structured data
try:
return self.parse_structured_section(content)
except:
return {"raw": content}
# String sections
else:
return content
def parse_structured_section(self, content: str) -> Dict[str, Any]:
"""Parse structured section content (key: value format)"""
result = {}
current_key = None
current_value = []
for line in content.split("\n"):
if ":" in line and not line.startswith(" "):
# Save previous key-value
if current_key:
result[current_key] = "\n".join(current_value).strip()
# Start new key
key, value = line.split(":", 1)
current_key = key.strip().lower().replace(" ", "_")
current_value = [value.strip()] if value.strip() else []
else:
current_value.append(line)
# Save last key-value
if current_key:
result[current_key] = "\n".join(current_value).strip()
return result
def validate_spec_structure(
self,
spec_path: Path,
work_item_type: str
) -> tuple[bool, List[str]]:
"""Validate spec against JSON schema"""
# Get schema
if work_item_type not in self.schemas:
return False, [f"No schema found for type: {work_item_type}"]
schema = self.schemas[work_item_type]
# Parse spec to structure
try:
structure = self.parse_spec_to_structure(spec_path)
except Exception as e:
return False, [f"Failed to parse spec: {str(e)}"]
# Validate against schema
errors = []
try:
jsonschema.validate(instance=structure, schema=schema)
return True, []
except jsonschema.ValidationError as e:
errors.append(self.format_validation_error(e))
except jsonschema.SchemaError as e:
errors.append(f"Schema error: {str(e)}")
return False, errors
def format_validation_error(self, error: jsonschema.ValidationError) -> str:
"""Format validation error with helpful message"""
path = " โ ".join(str(p) for p in error.path) if error.path else "root"
message = f"Validation error at '{path}':\n"
message += f" Issue: {error.message}\n"
# Add helpful suggestions
if error.validator == "required":
message += f" Fix: Add the required field '{error.validator_value}'\n"
elif error.validator == "enum":
message += f" Fix: Use one of: {', '.join(error.validator_value)}\n"
elif error.validator == "minLength":
message += f" Fix: Provide at least {error.validator_value} characters\n"
elif error.validator == "minItems":
message += f" Fix: Provide at least {error.validator_value} items\n"
return message
def validate_cross_references(
self,
spec_path: Path,
work_item_id: str
) -> tuple[bool, List[str]]:
"""Validate cross-references (dependencies exist, files exist, etc.)"""
from solokit.work_items.repository import WorkItemRepository
structure = self.parse_spec_to_structure(spec_path)
errors = []
# Validate dependencies exist
dependencies = structure.get("dependencies", [])
if dependencies:
repository = WorkItemRepository()
existing_work_items = {wi["id"] for wi in repository.list_work_items()}
for dep in dependencies:
if dep not in existing_work_items:
errors.append(f"Dependency '{dep}' does not exist")
elif dep == work_item_id:
errors.append(f"Work item cannot depend on itself")
# Validate file references (if mentioned)
files_to_modify = structure.get("implementation_details", {}).get("files_to_modify", [])
for file_path in files_to_modify:
if not Path(file_path).exists():
errors.append(f"File to modify does not exist: {file_path}")
return len(errors) == 0, errors
Enhanced spec validator:
# src/solokit/work_items/spec_validator.py (enhanced)
class SpecValidator:
def __init__(self):
self.parser = SpecParser()
def validate_spec(
self,
spec_path: Path,
work_item_type: str,
work_item_id: str
) -> tuple[bool, List[str]]:
"""Comprehensive spec validation"""
all_errors = []
# 1. Schema validation
schema_valid, schema_errors = self.parser.validate_spec_structure(
spec_path, work_item_type
)
all_errors.extend(schema_errors)
# 2. Cross-reference validation
refs_valid, ref_errors = self.parser.validate_cross_references(
spec_path, work_item_id
)
all_errors.extend(ref_errors)
# 3. Custom business rules
rules_valid, rule_errors = self.validate_business_rules(
spec_path, work_item_type
)
all_errors.extend(rule_errors)
return len(all_errors) == 0, all_errors
def validate_business_rules(
self,
spec_path: Path,
work_item_type: str
) -> tuple[bool, List[str]]:
"""Validate custom business rules"""
structure = self.parser.parse_spec_to_structure(spec_path)
errors = []
# Spike-specific rules
if work_item_type == "spike":
time_box = structure.get("time_box", "")
if not time_box:
errors.append("Spike must have a time box defined")
elif self.parse_time_box_hours(time_box) > 40:
errors.append(f"Spike time box too long: {time_box} (max 40 hours)")
# Security work item rules
if work_item_type == "security":
threat_model = structure.get("threat_model", "")
if not threat_model or len(threat_model) < 50:
errors.append("Security work item must have detailed threat model")
return len(errors) == 0, errors
Files Affected:
New:
src/solokit/templates/schemas/feature_spec_schema.json- Feature spec schema (will be created)src/solokit/templates/schemas/bug_spec_schema.json- Bug spec schema (will be created)src/solokit/templates/schemas/refactor_spec_schema.json- Refactor spec schema (will be created)src/solokit/templates/schemas/security_spec_schema.json- Security spec schema (will be created)src/solokit/templates/schemas/integration_test_spec_schema.json- Integration test schema (will be created)src/solokit/templates/schemas/deployment_spec_schema.json- Deployment spec schema (will be created)tests/unit/test_spec_parser_schema.py- Schema validation tests (will be created)tests/fixtures/specs/- Test spec fixtures (will be created)
Modified:
src/solokit/work_items/spec_parser.py- Enhanced with schema validationsrc/solokit/work_items/spec_validator.py- Use schema validationsrc/solokit/work_items/creator.py- Validate on work item creationpyproject.toml- Add jsonschema dependency
Benefits:
- Earlier error detection: Catch spec issues during creation, not during session
- Better error messages: Precise location and suggested fixes
- Type safety: Ensure fields have correct types and formats
- Consistency: All specs follow standard structure
- Extensibility: Easy to add new validation rules
- Migration support: Can evolve spec format over time
- Documentation: Schema serves as spec documentation
- IDE support: Schemas enable autocomplete in IDEs
Priority: Medium-High - Quality improvement, prevents errors
Notes:
- Backward compatible: Existing specs validated, warnings shown but not blocked
- Migration tool can convert old specs to new schema
- Schemas can evolve with version numbers
- Custom types (Enhancement #32) can define their own schemas
Enhancement #24: Custom Work Item Types
Status: ๐ต IDENTIFIED
Problem:
Solokit currently supports only 6 fixed work item types (feature, bug, refactor, security, integration_test, deployment). This creates limitations:
- No project-specific types: Different projects need different work item types (spike, research, documentation-task, data-migration, experiment, etc.)
- No extensibility: Users cannot define custom types for their workflow
- Rigid structure: Solo developers may want simpler or more specialized types
- Missing common types: Common software development activities like "spike" (time-boxed investigation) or "research" have no dedicated type
Example use cases:
Solo developer working on data-intensive project:
- Needs: data-migration, data-validation, schema-evolution work item types
- Current: Must use "feature" or "refactor" which don't fit semantically
Solo developer doing R&D:
- Needs: spike, research, experiment, proof-of-concept types
- Current: No appropriate type, forced to use "feature"
Solo developer maintaining docs:
- Needs: documentation-task, tutorial, guide types
- Current: No dedicated documentation type
Proposed Solution:
Implement custom work item type system allowing users to define their own work item types with:
-
User-Defined Type Schema
- Define custom type name and metadata
- Specify required and optional spec sections
- Set default priority and milestone behavior
- Configure type-specific quality gates
-
Custom Spec Templates
- Create custom spec templates for each type
- Define type-specific validation rules
- Include type-specific guidance and examples
- Template variables for dynamic content
-
Type-Specific Quality Gates
- Different quality gate requirements per type
- Example: "spike" type may not require tests
- Example: "documentation-task" may only require linting and grammar checks
- Configurable gate strictness per type
-
Type Lifecycle Configuration
- Define valid status transitions per type
- Set default session behavior (single-session vs multi-session)
- Configure completion criteria
- Set up type-specific git branch naming patterns
Implementation:
Custom type definition:
# .session/config.json - custom_work_item_types section
{
"custom_work_item_types": {
"spike": {
"display_name": "Spike",
"description": "Time-boxed investigation or research task",
"template_file": "spike_spec.md",
"required_sections": [
"Goal",
"Time Box",
"Questions to Answer",
"Findings",
"Recommendations"
],
"optional_sections": [
"References",
"Experiments Conducted"
],
"quality_gates": {
"tests": {"enabled": false, "required": false},
"linting": {"enabled": false, "required": false},
"documentation": {"enabled": true, "required": true}
},
"default_priority": "medium",
"typical_duration_days": 2,
"multi_session_allowed": false,
"branch_prefix": "spike"
},
"data_migration": {
"display_name": "Data Migration",
"description": "Database schema or data migration task",
"template_file": "data_migration_spec.md",
"required_sections": [
"Migration Goal",
"Current Schema",
"Target Schema",
"Data Transformation",
"Rollback Plan",
"Testing Strategy"
],
"quality_gates": {
"tests": {"enabled": true, "required": true, "coverage_threshold": 95},
"integration_tests": {"enabled": true, "required": true},
"rollback_test": {"enabled": true, "required": true},
"backup_verification": {"enabled": true, "required": true}
},
"default_priority": "high",
"multi_session_allowed": true,
"branch_prefix": "migration"
},
"documentation_task": {
"display_name": "Documentation Task",
"description": "Documentation writing or updating",
"template_file": "documentation_task_spec.md",
"required_sections": [
"Documentation Goal",
"Target Audience",
"Content Outline",
"Examples Required"
],
"quality_gates": {
"tests": {"enabled": false, "required": false},
"linting": {"enabled": true, "required": true},
"grammar_check": {"enabled": true, "required": true},
"link_validation": {"enabled": true, "required": true},
"documentation": {"enabled": false, "required": false}
},
"default_priority": "low",
"multi_session_allowed": false,
"branch_prefix": "docs"
},
"experiment": {
"display_name": "Experiment",
"description": "Experimental feature or proof of concept",
"template_file": "experiment_spec.md",
"required_sections": [
"Hypothesis",
"Success Criteria",
"Experiment Design",
"Results",
"Conclusion"
],
"quality_gates": {
"tests": {"enabled": false, "required": false},
"documentation": {"enabled": true, "required": true}
},
"default_priority": "low",
"typical_duration_days": 3,
"multi_session_allowed": false,
"branch_prefix": "experiment"
}
}
}
Custom spec template example:
# src/solokit/templates/spike_spec.md
---
type: spike
---
# [Spike Title]
**Type:** Spike
**Time Box:** [e.g., 2 days, 8 hours]
**Created:** [Auto-generated]
## Goal
What question are you trying to answer? What are you investigating?
## Questions to Answer
1. [Question 1]
2. [Question 2]
3. [Question 3]
## Approach
How will you conduct this investigation?
- [ ] Research approach 1
- [ ] Experiment 2
- [ ] Prototype 3
## Findings
*(To be filled during/after spike)*
### What We Learned
- Finding 1
- Finding 2
### What We Don't Know Yet
- Unknown 1
- Unknown 2
## Recommendations
Based on findings, what should we do next?
- [ ] Recommendation 1: [Create feature work item / Continue research / Abandon approach]
- [ ] Recommendation 2
## References
- [External resources, articles, documentation]
## Time Tracking
- Time spent: [e.g., 6 hours out of 8 hour time box]
- Time box respected: [Yes/No]
Type manager:
# src/solokit/work_items/type_manager.py
class WorkItemTypeManager:
def __init__(self, config_path=".session/config.json"):
self.config = self.load_config(config_path)
self.built_in_types = self.load_built_in_types()
self.custom_types = self.load_custom_types()
def get_all_types(self):
"""Return all available work item types (built-in + custom)"""
return {**self.built_in_types, **self.custom_types}
def get_type_config(self, type_name):
"""Get configuration for a specific work item type"""
all_types = self.get_all_types()
if type_name not in all_types:
raise ValueError(f"Unknown work item type: {type_name}")
return all_types[type_name]
def validate_type_definition(self, type_config):
"""Validate custom type configuration"""
required_fields = ["display_name", "description", "template_file",
"required_sections", "quality_gates"]
for field in required_fields:
if field not in type_config:
raise ValueError(f"Custom type missing required field: {field}")
# Validate template file exists
template_path = Path("src/solokit/templates") / type_config["template_file"]
if not template_path.exists():
raise FileNotFoundError(f"Template not found: {template_path}")
return True
def create_custom_type(self, type_name, type_config):
"""Create a new custom work item type"""
self.validate_type_definition(type_config)
# Add to config
if "custom_work_item_types" not in self.config:
self.config["custom_work_item_types"] = {}
self.config["custom_work_item_types"][type_name] = type_config
self.save_config()
return type_name
def get_quality_gates_for_type(self, type_name):
"""Get quality gate configuration for work item type"""
type_config = self.get_type_config(type_name)
return type_config.get("quality_gates", {})
def get_required_sections_for_type(self, type_name):
"""Get required spec sections for work item type"""
type_config = self.get_type_config(type_name)
return type_config.get("required_sections", [])
Enhanced work item creation:
# src/solokit/work_items/creator.py (modified)
def create_work_item(self, work_item_id, work_item_type, **kwargs):
"""Create work item with support for custom types"""
type_manager = WorkItemTypeManager()
# Validate type exists (built-in or custom)
if work_item_type not in type_manager.get_all_types():
available_types = ", ".join(type_manager.get_all_types().keys())
raise ValueError(f"Unknown type '{work_item_type}'. Available: {available_types}")
# Get type configuration
type_config = type_manager.get_type_config(work_item_type)
# Create work item with type-specific defaults
work_item = {
"id": work_item_id,
"type": work_item_type,
"priority": kwargs.get("priority", type_config.get("default_priority", "medium")),
"status": "not_started",
# ... rest of work item creation
}
# Generate spec from type-specific template
spec_content = self.generate_spec_from_template(
template=type_config["template_file"],
work_item=work_item
)
# Save spec file
spec_path = Path(f".session/specs/{work_item_id}.md")
spec_path.write_text(spec_content)
return work_item
Quality gates integration:
# src/solokit/quality/gates.py (modified)
def get_gates_for_work_item(self, work_item):
"""Get quality gates based on work item type"""
type_manager = WorkItemTypeManager()
type_config = type_manager.get_type_config(work_item["type"])
# Get type-specific quality gate configuration
type_gates = type_config.get("quality_gates", {})
# Merge with default gates, type-specific takes precedence
gates = self.default_gates.copy()
gates.update(type_gates)
return gates
Commands:
# List all available work item types (built-in + custom)
/sk:work-types
# Create custom work item type interactively
/sk:work-type-create
# Create custom work item type from file
/sk:work-type-create --from-file .solokit/custom_types/spike.yml
# Validate custom type definition
/sk:work-type-validate --type spike
# Show details of a work item type
/sk:work-type-show spike
Files Affected:
New:
src/solokit/work_items/type_manager.py- Custom type management (will be created)src/solokit/templates/spike_spec.md- Spike spec template (will be created)src/solokit/templates/data_migration_spec.md- Data migration template (will be created)src/solokit/templates/documentation_task_spec.md- Documentation task template (will be created)src/solokit/templates/experiment_spec.md- Experiment template (will be created).claude/commands/work-types.md- List types command (will be created).claude/commands/work-type-create.md- Create custom type command (will be created).claude/commands/work-type-show.md- Show type details command (will be created)tests/unit/test_type_manager.py- Type manager tests (will be created)tests/e2e/test_custom_work_item_types.py- Custom type E2E tests (will be created)
Modified:
src/solokit/work_items/creator.py- Support custom types in creationsrc/solokit/work_items/spec_validator.py- Validate against type-specific requirementssrc/solokit/quality/gates.py- Type-specific quality gates.session/config.json- Add custom_work_item_types section.claude/commands/work-new.md- Document custom type support
Benefits:
- Project flexibility: Adapt Solokit to any project's workflow and terminology
- Better semantics: Use work item types that match the actual work being done
- Workflow optimization: Different quality gates for different work types
- Common patterns: Support common types like spike, research, experiment
- Solo developer friendly: Simpler types for simple projects, complex for complex
- Extensibility: Framework grows with user needs
- Type safety: Validation ensures custom types are well-formed
- Documentation: Custom templates guide users through unfamiliar work types
Priority: High - Extensibility is foundational for framework adoption
Notes:
- Custom types stored in
.session/config.jsonfor project-specific customization - Built-in types cannot be modified (ensures backward compatibility)
- Template variables allow dynamic content generation
- Type-specific quality gates prevent inappropriate requirements (e.g., no tests for documentation)
- Community could share custom type definitions
Enhancement #25: MCP Server Integration
Status: ๐ต IDENTIFIED
Problem:
Current Solokit-Claude Code integration is via slash commands that execute CLI commands and return text output. This creates limitations:
- Text-only output: All Solokit data must be formatted as text for stdout/stderr
- No programmatic access: Claude cannot query Solokit state directly
- Parsing overhead: Claude must parse text output to understand Solokit data
- Limited interactivity: Cannot have rich, interactive conversations about Solokit state
- No structured data: JSON/structured data must be formatted as text then parsed
- Foundation missing: Cannot build advanced features like inline annotations without programmatic access
Example of current limitation:
User: "What learnings are relevant to authentication?"
Current flow:
1. User must use /learn-search authentication
2. CLI returns text output
3. Claude reads and interprets text
4. Claude formats response to user
Desired flow with MCP:
1. Claude directly queries: solokit://learnings/search?query=authentication&limit=10
2. Receives structured JSON response
3. Claude analyzes and presents insights
4. Can follow up with related queries programmatically
Proposed Solution:
Implement MCP (Model Context Protocol) server for Solokit that exposes Solokit operations as structured tools:
-
MCP Server Implementation
- Standalone MCP server process
- Exposes Solokit operations as MCP tools
- Returns structured data (JSON) instead of text
- Handles concurrent requests
- Maintains session state
-
MCP Tools for Solokit Operations
- Work item operations (list, get, create, update, delete)
- Learning operations (search, get, create, curate)
- Session operations (status, start, end, validate)
- Quality gate operations (run, get results)
- Visualization operations (dependency graph)
- Project operations (status, metrics)
-
Rich Data Structures
- Typed responses (not string parsing)
- Nested objects for complex data
- Metadata and context in responses
- Error handling with structured error objects
-
Real-Time State Access
- Query Solokit state anytime
- No need to run CLI commands
- Efficient data access
- Caching for performance
Implementation:
MCP Server:
# src/solokit/mcp/server.py
import asyncio
from typing import Any, Dict, List
from mcp import Server, Tool
class SDDMCPServer:
def __init__(self):
self.server = Server("solokit")
self.register_tools()
def register_tools(self):
"""Register all Solokit tools with MCP server"""
# Work item tools
self.server.add_tool(Tool(
name="sdd_work_items_list",
description="List all work items with optional filters",
parameters={
"type": "object",
"properties": {
"status": {"type": "string", "enum": ["not_started", "in_progress", "blocked", "completed"]},
"type": {"type": "string"},
"milestone": {"type": "string"}
}
},
handler=self.list_work_items
))
self.server.add_tool(Tool(
name="sdd_work_item_get",
description="Get detailed information about a specific work item",
parameters={
"type": "object",
"properties": {
"work_item_id": {"type": "string", "required": True}
},
"required": ["work_item_id"]
},
handler=self.get_work_item
))
# Learning tools
self.server.add_tool(Tool(
name="sdd_learnings_search",
description="Search learnings by keyword or semantic query",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "required": True},
"limit": {"type": "integer", "default": 10},
"category": {"type": "string"},
"semantic": {"type": "boolean", "default": False}
},
"required": ["query"]
},
handler=self.search_learnings
))
self.server.add_tool(Tool(
name="sdd_learnings_relevant",
description="Get learnings relevant to a work item or topic",
parameters={
"type": "object",
"properties": {
"work_item_id": {"type": "string"},
"topic": {"type": "string"},
"limit": {"type": "integer", "default": 10}
}
},
handler=self.get_relevant_learnings
))
# Session tools
self.server.add_tool(Tool(
name="sdd_session_status",
description="Get current session status and progress",
parameters={"type": "object", "properties": {}},
handler=self.get_session_status
))
self.server.add_tool(Tool(
name="sdd_quality_gates_results",
description="Get quality gate results for current or past sessions",
parameters={
"type": "object",
"properties": {
"session_id": {"type": "string"},
"work_item_id": {"type": "string"}
}
},
handler=self.get_quality_gate_results
))
# Visualization tools
self.server.add_tool(Tool(
name="sdd_dependency_graph",
description="Get work item dependency graph data",
parameters={
"type": "object",
"properties": {
"format": {"type": "string", "enum": ["json", "ascii", "dot"], "default": "json"},
"focus": {"type": "string"},
"include_completed": {"type": "boolean", "default": False}
}
},
handler=self.get_dependency_graph
))
# Project metrics tools
self.server.add_tool(Tool(
name="sdd_project_metrics",
description="Get project-level metrics and statistics",
parameters={
"type": "object",
"properties": {
"metric_type": {"type": "string", "enum": ["velocity", "quality", "learnings", "all"], "default": "all"}
}
},
handler=self.get_project_metrics
))
async def list_work_items(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""List work items with filters"""
from solokit.work_items.repository import WorkItemRepository
repository = WorkItemRepository()
work_items = repository.list_work_items(
status=params.get("status"),
work_type=params.get("type"),
milestone=params.get("milestone")
)
return {
"work_items": work_items,
"total": len(work_items),
"filters_applied": {k: v for k, v in params.items() if v is not None}
}
async def get_work_item(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""Get detailed work item information"""
from solokit.work_items.repository import WorkItemRepository
repository = WorkItemRepository()
work_item_id = params["work_item_id"]
# Get work item metadata
work_item = repository.get_work_item(work_item_id)
# Get spec content
spec_path = Path(f".session/specs/{work_item_id}.md")
spec_content = spec_path.read_text() if spec_path.exists() else None
# Get session history
sessions = repository.get_work_item_sessions(work_item_id)
return {
"work_item": work_item,
"spec_content": spec_content,
"sessions": sessions,
"dependency_info": repository.get_dependency_info(work_item_id)
}
async def search_learnings(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""Search learnings"""
from solokit.learning.repository import LearningRepository
repository = LearningRepository()
query = params["query"]
limit = params.get("limit", 10)
category = params.get("category")
semantic = params.get("semantic", False)
if semantic:
# Use AI-powered semantic search (Enhancement #37)
results = repository.semantic_search(query, limit=limit, category=category)
else:
# Use keyword search
results = repository.search(query, limit=limit, category=category)
return {
"learnings": results,
"total": len(results),
"query": query,
"search_type": "semantic" if semantic else "keyword"
}
async def get_relevant_learnings(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""Get learnings relevant to work item or topic"""
from solokit.session.briefing.learning_loader import get_relevant_learnings
work_item_id = params.get("work_item_id")
topic = params.get("topic")
limit = params.get("limit", 10)
if work_item_id:
# Get learnings for work item
learnings = get_relevant_learnings(work_item_id, limit=limit)
context = f"work item: {work_item_id}"
elif topic:
# Get learnings for topic
learnings = get_relevant_learnings(topic, limit=limit)
context = f"topic: {topic}"
else:
return {"error": "Must provide work_item_id or topic"}
return {
"learnings": learnings,
"total": len(learnings),
"context": context
}
async def get_session_status(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""Get current session status"""
from solokit.session.status import get_session_status
status = get_session_status()
return status
async def get_quality_gate_results(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""Get quality gate results"""
from solokit.quality.gates import QualityGateRunner
runner = QualityGateRunner()
session_id = params.get("session_id")
work_item_id = params.get("work_item_id")
results = runner.get_results(session_id=session_id, work_item_id=work_item_id)
return {
"quality_gate_results": results,
"session_id": session_id,
"work_item_id": work_item_id
}
async def get_dependency_graph(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""Get dependency graph"""
from solokit.visualization.dependency_graph import DependencyGraph
graph = DependencyGraph()
format_type = params.get("format", "json")
focus = params.get("focus")
include_completed = params.get("include_completed", False)
graph_data = graph.generate(
format=format_type,
focus=focus,
include_completed=include_completed
)
return {
"graph": graph_data,
"format": format_type,
"metadata": graph.get_metadata()
}
async def get_project_metrics(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""Get project metrics"""
from solokit.improvement.dora_metrics import DORAMetrics
from solokit.improvement.velocity_tracker import VelocityTracker
metric_type = params.get("metric_type", "all")
metrics = {}
if metric_type in ["velocity", "all"]:
velocity = VelocityTracker()
metrics["velocity"] = velocity.get_metrics()
if metric_type in ["quality", "all"]:
dora = DORAMetrics()
metrics["dora"] = dora.get_metrics()
if metric_type in ["learnings", "all"]:
from solokit.learning.curator import LearningCurator
curator = LearningCurator()
metrics["learnings"] = curator.get_statistics()
return {
"metrics": metrics,
"metric_type": metric_type
}
async def start(self):
"""Start MCP server"""
await self.server.start()
# Entry point
async def main():
server = SDDMCPServer()
await server.start()
if __name__ == "__main__":
asyncio.run(main())
Claude Code MCP Configuration:
// ~/.claude/config.json or project-specific config
{
"mcpServers": {
"solokit": {
"command": "solokit",
"args": ["mcp", "serve"],
"env": {}
}
}
}
CLI command to start MCP server:
# Start MCP server
solokit mcp serve
# Or via Claude Code (auto-started)
Example usage in Claude Code:
User: "What learnings do we have about authentication?"
Claude (internal):
โ Calls sdd_learnings_search(query="authentication", limit=10, semantic=True)
โ Receives structured JSON with learnings
โ Analyzes and presents to user
Claude (to user): "We have 7 learnings about authentication:
1. Always use bcrypt for password hashing (Security - Session 5)
2. Implement JWT token refresh mechanism (Best Practices - Session 8)
..."
Files Affected:
New:
src/solokit/mcp/server.py(will be created) - MCP server implementationsrc/solokit/mcp/tools.py(will be created) - MCP tool definitionssrc/solokit/mcp/__init__.py(will be created) - MCP module initializationdocs/mcp/README.md(will be created) - MCP integration documentationdocs/mcp/tools.md(will be created) - MCP tools referencetests/unit/test_mcp_server.py(will be created) - MCP server teststests/integration/test_mcp_integration.py(will be created) - Integration tests
Modified:
src/solokit/cli.py- Add MCP server commandREADME.md- Document MCP integration.claude/config.json- MCP server configuration example
Benefits:
- Programmatic access: Claude can query Solokit state directly
- Structured data: No text parsing, clean JSON responses
- Rich interactions: Contextual follow-up queries
- Foundation for features: Enables inline annotations, real-time updates
- Better UX: Faster, more accurate responses
- Extensibility: Easy to add new MCP tools
- Standard protocol: MCP is Claude Code's official integration method
- Stateful: Server maintains context across queries
Priority: High - Foundation for better Claude Code integration (required for Enhancement #35)
Notes:
- Requires Claude Code with MCP support
- MCP server runs as separate process
- Server lifecycle managed by Claude Code
- Backward compatible (slash commands still work)
- Performance: MCP calls faster than CLI commands
Enhancement #26: Context-Aware MCP Server Management
Status: ๐ต IDENTIFIED
Problem:
MCP (Model Context Protocol) servers provide valuable capabilities like accessing documentation (context7), visual testing (playwright), database querying, and more. However, they have significant limitations:
-
Token Consumption:
- MCP servers consume context tokens even when idle
- Not all servers are relevant to every work item
- Context window fills up with unused server definitions
- Reduces space available for code and briefing content
-
Manual Management:
- Developers must manually configure MCP servers per project
- Must remember which servers are useful for which tasks
- No systematic way to enable/disable servers per session
- Server selection is reactive, not proactive
-
No Context Awareness:
- Same servers enabled for frontend and backend work
- Playwright enabled during database optimization work (irrelevant)
- Documentation servers for wrong tech stack
- No intelligent server selection based on work context
-
Setup Friction:
- Setting up MCP servers requires manual configuration
- No project templates include MCP server setup
- Developers may not know which servers are useful
- Configuration is project-specific but not automated
Proposed Solution:
Implement context-aware MCP server management that automatically enables relevant servers based on work item context:
-
Project-Level Server Registry
- Define available MCP servers during
sk init - Servers registered but not enabled by default (zero token cost)
- Server metadata: name, purpose, tech stack, use cases
- Template-based server recommendations
- Define available MCP servers during
-
Context-Aware Enablement
sk startanalyzes work item and enables relevant servers- Smart matching based on work item type, tags, tech stack, dependencies
- Token budgeting: enable servers within context budget
- Priority-based selection when budget is limited
-
Intelligent Server Selection
- Frontend work โ Enable playwright, context7 (frontend frameworks)
- Backend work โ Enable database tools, API testing tools
- Security work โ Enable security scanning tools
- Documentation work โ Enable context7 for all relevant frameworks
-
Manual Override
- Explicit enable/disable via flags
- Session-specific server configuration
- Persistent preferences for specific work item types
Implementation:
MCP server registry schema:
// .session/mcp_servers.json
{
"servers": [
{
"id": "context7",
"name": "Context7",
"description": "Access up-to-date documentation for any framework or library",
"command": "npx",
"args": ["-y", "@context7/mcp"],
"enabled_by_default": false,
"relevance_rules": {
"work_item_types": ["feature", "bug", "refactor"],
"tech_stack": ["*"], // All tech stacks
"tags": ["documentation", "learning", "new-technology"],
"priority": "high"
},
"token_cost_estimate": 500,
"env": {}
},
{
"id": "playwright",
"name": "Playwright MCP",
"description": "Visual testing and screenshot capture for frontend work",
"command": "npx",
"args": ["-y", "@playwright/mcp-server"],
"enabled_by_default": false,
"relevance_rules": {
"work_item_types": ["feature", "bug"],
"tech_stack": ["react", "vue", "svelte", "angular", "next.js", "nuxt"],
"tags": ["frontend", "ui", "visual", "responsive"],
"keywords": ["layout", "design", "visual", "ui", "component", "page"],
"priority": "high"
},
"token_cost_estimate": 800,
"env": {}
},
{
"id": "postgres",
"name": "PostgreSQL MCP",
"description": "Query and inspect PostgreSQL databases",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"enabled_by_default": false,
"relevance_rules": {
"work_item_types": ["feature", "bug", "refactor", "performance"],
"tech_stack": ["postgresql", "postgres"],
"tags": ["database", "backend", "data", "migration"],
"keywords": ["query", "database", "sql", "migration", "schema"],
"priority": "medium"
},
"token_cost_estimate": 600,
"env": {
"POSTGRES_CONNECTION_STRING": "postgresql://user:password@localhost:5432/db"
}
},
{
"id": "filesystem",
"name": "Filesystem MCP",
"description": "Read and manipulate files across the project",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem"],
"enabled_by_default": false,
"relevance_rules": {
"work_item_types": ["feature", "bug", "refactor"],
"tech_stack": ["*"],
"tags": ["refactoring", "large-changes"],
"priority": "low"
},
"token_cost_estimate": 400,
"env": {}
}
],
"global_config": {
"max_token_budget": 2000,
"auto_enable": true,
"prefer_higher_priority": true
}
}
Server selection algorithm:
# src/solokit/mcp/server_manager.py
from typing import List, Dict, Any
from pathlib import Path
import json
class MCPServerManager:
"""Manage MCP server lifecycle and context-aware enablement"""
def __init__(self, session_dir: Path = Path(".session")):
self.session_dir = session_dir
self.registry_path = session_dir / "mcp_servers.json"
def load_registry(self) -> Dict[str, Any]:
"""Load MCP server registry"""
if not self.registry_path.exists():
return {"servers": [], "global_config": {}}
return json.loads(self.registry_path.read_text())
def save_registry(self, registry: Dict[str, Any]):
"""Save MCP server registry"""
self.registry_path.write_text(json.dumps(registry, indent=2))
def select_servers_for_work_item(
self,
work_item: Dict[str, Any],
work_item_spec: str,
tech_stack: List[str]
) -> List[Dict[str, Any]]:
"""
Select relevant MCP servers based on work item context.
Returns list of servers to enable, sorted by priority.
"""
registry = self.load_registry()
servers = registry.get("servers", [])
global_config = registry.get("global_config", {})
if not global_config.get("auto_enable", True):
return []
# Score each server based on relevance
scored_servers = []
for server in servers:
score = self._score_server_relevance(
server,
work_item,
work_item_spec,
tech_stack
)
if score > 0:
scored_servers.append((server, score))
# Sort by score (higher = more relevant)
scored_servers.sort(key=lambda x: x[1], reverse=True)
# Apply token budget
max_budget = global_config.get("max_token_budget", 2000)
selected_servers = []
total_tokens = 0
for server, score in scored_servers:
token_cost = server.get("token_cost_estimate", 500)
if total_tokens + token_cost <= max_budget:
selected_servers.append({
**server,
"relevance_score": score
})
total_tokens += token_cost
elif global_config.get("prefer_higher_priority", True):
# Skip if over budget
continue
return selected_servers
def _score_server_relevance(
self,
server: Dict[str, Any],
work_item: Dict[str, Any],
work_item_spec: str,
tech_stack: List[str]
) -> float:
"""
Score server relevance (0.0-1.0) based on multiple factors.
"""
rules = server.get("relevance_rules", {})
score = 0.0
# Work item type match
if work_item["type"] in rules.get("work_item_types", []):
score += 0.3
# Tech stack match
server_stack = rules.get("tech_stack", [])
if "*" in server_stack or any(t in tech_stack for t in server_stack):
score += 0.3
# Tags match
work_item_tags = work_item.get("tags", [])
server_tags = rules.get("tags", [])
tag_overlap = len(set(work_item_tags) & set(server_tags))
if tag_overlap > 0:
score += 0.2 * min(tag_overlap / len(server_tags), 1.0)
# Keyword match in title or spec
keywords = rules.get("keywords", [])
text = f"{work_item['title']} {work_item_spec}".lower()
keyword_matches = sum(1 for kw in keywords if kw.lower() in text)
if keyword_matches > 0:
score += 0.2 * min(keyword_matches / len(keywords), 1.0)
# Priority boost
priority_boost = {
"critical": 0.3,
"high": 0.2,
"medium": 0.1,
"low": 0.0
}
score += priority_boost.get(rules.get("priority", "low"), 0.0)
return min(score, 1.0)
def enable_servers(self, server_ids: List[str]) -> Dict[str, Any]:
"""
Generate MCP configuration for enabled servers.
Returns Claude Code MCP config dict.
"""
registry = self.load_registry()
servers = registry.get("servers", [])
mcp_config = {"mcpServers": {}}
for server in servers:
if server["id"] in server_ids:
mcp_config["mcpServers"][server["id"]] = {
"command": server["command"],
"args": server["args"],
"env": server.get("env", {})
}
return mcp_config
def initialize_default_servers(self, tech_stack: List[str]):
"""Initialize server registry with recommended servers for tech stack"""
default_servers = self._get_default_servers_for_stack(tech_stack)
registry = {
"servers": default_servers,
"global_config": {
"max_token_budget": 2000,
"auto_enable": True,
"prefer_higher_priority": True
}
}
self.save_registry(registry)
def _get_default_servers_for_stack(self, tech_stack: List[str]) -> List[Dict]:
"""Get recommended MCP servers based on tech stack"""
# Always include context7 (universal documentation)
servers = [
{
"id": "context7",
"name": "Context7",
"description": "Access up-to-date documentation",
"command": "npx",
"args": ["-y", "@context7/mcp"],
"enabled_by_default": False,
"relevance_rules": {
"work_item_types": ["feature", "bug", "refactor"],
"tech_stack": ["*"],
"tags": ["documentation", "learning"],
"priority": "high"
},
"token_cost_estimate": 500,
"env": {}
}
]
# Add playwright for frontend stacks
frontend_stacks = ["react", "vue", "svelte", "angular", "next.js", "nuxt"]
if any(stack in tech_stack for stack in frontend_stacks):
servers.append({
"id": "playwright",
"name": "Playwright MCP",
"description": "Visual testing and screenshots",
"command": "npx",
"args": ["-y", "@playwright/mcp-server"],
"enabled_by_default": False,
"relevance_rules": {
"work_item_types": ["feature", "bug"],
"tech_stack": frontend_stacks,
"tags": ["frontend", "ui", "visual"],
"keywords": ["layout", "design", "ui", "component"],
"priority": "high"
},
"token_cost_estimate": 800,
"env": {}
})
# Add database servers
if "postgresql" in tech_stack or "postgres" in tech_stack:
servers.append({
"id": "postgres",
"name": "PostgreSQL MCP",
"description": "Query PostgreSQL databases",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-postgres"],
"enabled_by_default": False,
"relevance_rules": {
"work_item_types": ["feature", "bug", "refactor"],
"tech_stack": ["postgresql", "postgres"],
"tags": ["database", "backend"],
"keywords": ["query", "database", "sql"],
"priority": "medium"
},
"token_cost_estimate": 600,
"env": {}
})
return servers
Integration with session start:
# src/solokit/session/briefing/orchestrator.py (enhanced)
def start_session(work_item_id: str, enable_servers: List[str] = None, disable_servers: List[str] = None):
"""Start development session with context-aware MCP server enablement"""
from solokit.work_items.repository import WorkItemRepository
from solokit.mcp.server_manager import MCPServerManager
from solokit.project.stack import detect_tech_stack
# Get work item
repository = WorkItemRepository()
work_item = repository.get_work_item(work_item_id)
# Get spec
spec_path = Path(f".session/specs/{work_item_id}.md")
spec_content = spec_path.read_text() if spec_path.exists() else ""
# Detect tech stack
tech_stack = detect_tech_stack()
# Select relevant MCP servers
mcp_manager = MCPServerManager()
if enable_servers:
# Manual override
selected_servers = [s for s in mcp_manager.load_registry()["servers"] if s["id"] in enable_servers]
else:
# Automatic selection
selected_servers = mcp_manager.select_servers_for_work_item(
work_item,
spec_content,
tech_stack
)
# Apply disable list
if disable_servers:
selected_servers = [s for s in selected_servers if s["id"] not in disable_servers]
# Generate briefing with MCP server info
briefing = generate_briefing(work_item_id)
# Add MCP server section
if selected_servers:
briefing += "\n\n## Enabled MCP Servers\n\n"
briefing += "The following MCP servers are enabled for this session:\n\n"
for server in selected_servers:
briefing += f"- **{server['name']}**: {server['description']}\n"
briefing += f" - Relevance: {server.get('relevance_score', 0.0):.2f}\n"
briefing += "\nThese servers were automatically selected based on the work item context.\n"
# Generate MCP config (for potential future auto-configuration)
mcp_config = mcp_manager.enable_servers([s["id"] for s in selected_servers])
# Save session state
save_session_state({
"work_item_id": work_item_id,
"enabled_mcp_servers": [s["id"] for s in selected_servers],
"mcp_config": mcp_config
})
return briefing
Integration with sk init:
# src/solokit/project/init.py (enhanced)
def initialize_project(template: str = None):
"""Initialize Solokit project with MCP server recommendations"""
from solokit.mcp.server_manager import MCPServerManager
from solokit.project.stack import detect_tech_stack
# Create .session directory
session_dir = Path(".session")
session_dir.mkdir(exist_ok=True)
# Detect tech stack
tech_stack = detect_tech_stack()
# Initialize MCP server registry with recommendations
mcp_manager = MCPServerManager(session_dir)
mcp_manager.initialize_default_servers(tech_stack)
print("\nโ MCP Server Registry initialized")
print(f" Recommended servers for {', '.join(tech_stack)}:")
registry = mcp_manager.load_registry()
for server in registry["servers"]:
print(f" - {server['name']}: {server['description']}")
print("\n Servers are disabled by default to save context tokens.")
print(" They will be automatically enabled during 'sk start' based on work item context.")
# ... rest of initialization
Commands:
# Start session with automatic server selection
/sk:start WI-001
# Start with manual server override
/sk:start WI-001 --enable-servers context7,playwright
# Start with specific servers disabled
/sk:start WI-001 --disable-servers postgres
# List available MCP servers
/sk:mcp-list
# Add custom MCP server
/sk:mcp-add --id custom-server --command npx --args "-y,my-mcp-server" --priority high
# Test server relevance for work item
/sk:mcp-test WI-001
# Show currently enabled servers
/sk:status # (includes MCP server section)
Configuration:
// .session/config.json (enhanced)
{
"mcp_server_management": {
"enabled": true,
"auto_enable": true,
"max_token_budget": 2000,
"prefer_higher_priority": true,
"manual_overrides": {
"WI-001": {
"enabled_servers": ["context7", "playwright"],
"disabled_servers": []
}
}
}
}
Files Affected:
New:
src/solokit/mcp/(will be created) - New modulesrc/solokit/mcp/__init__.py(will be created) - Module initsrc/solokit/mcp/server_manager.py(will be created) - MCP server management.session/mcp_servers.json(will be created) - Server registry (created per project)tests/unit/test_mcp_server_manager.py(will be created) - Unit teststests/integration/test_mcp_integration.py(will be created) - Integration tests.claude/commands/mcp-list.md(will be created) - List servers command.claude/commands/mcp-add.md(will be created) - Add server command.claude/commands/mcp-test.md(will be created) - Test relevance command
Modified:
src/solokit/session/briefing/orchestrator.py- Add MCP server selectionsrc/solokit/session/briefing/formatter.py- Include MCP server info in briefingsrc/solokit/project/init.py- Initialize MCP server registrysrc/solokit/templates/config.schema.json- Add MCP config schema.claude/commands/start.md- Document server flags.claude/commands/status.md- Show enabled serversREADME.md- Document MCP server management
Benefits:
- Context Efficiency: Only enable relevant servers, save thousands of context tokens
- Intelligent Selection: Automatic server selection based on work context
- Zero Configuration: Servers recommended and configured during init
- Manual Control: Override automatic selection when needed
- Token Budgeting: Respect context window limits with smart budgeting
- Discoverability: Developers learn about useful MCP servers through recommendations
- Template Integration: Project templates include optimal MCP server setups
- Session Awareness: Briefings show which servers are available and why
- Extensibility: Easy to add custom project-specific MCP servers
- Cost Control: Reduce API costs from unused MCP server context
Priority: High
Justification:
- Directly improves context window efficiency (critical for large projects)
- Enhances developer experience with intelligent automation
- Enables better use of MCP ecosystem
- Foundation for advanced MCP integration features
- Aligns with Solokit's philosophy of intelligent automation
Notes:
- MCP servers are registered but disabled by default (zero cost)
- Server selection is context-aware but can be overridden
- Token cost estimates should be calibrated based on actual usage
- Server relevance scoring can be enhanced with AI (future: use Enhancement #37's AI capabilities)
- Works with Enhancement #13 (Template-Based Init) - templates include MCP server recommendations
- Complements Enhancement #14 (Session Briefing Optimization) - reduces context usage
- Related to Enhancement #33 (MCP Server Integration) - both use MCP but for different purposes (#33 makes Solokit an MCP server, #38 manages other MCP servers)
Enhancement #27: Inline Editor Annotations (via MCP)
Status: ๐ต IDENTIFIED
Problem:
When working on a work item, developers have no real-time visibility of Solokit state in their editor:
- No work item status visibility: Can't see if current file relates to an active work item
- No learning hints: Relevant learnings not shown in context
- No quality gate indicators: Must run
/validatemanually to see issues - Context switching: Must switch to terminal to check Solokit state
- Lost context: May forget which work item is active
Example scenario:
Developer opens: src/auth/jwt.py
Current experience:
- No indication this relates to work_item_authentication
- No reminder of relevant learnings about JWT
- No warning about quality gate failures
- Must run /status or /validate to see any Solokit info
Desired experience:
Editor shows inline annotations:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ src/auth/jwt.py โ
โ โ
โ ๐ Work Item: feature_jwt_authentication โ
โ Status: In Progress (Session 15) โ
โ Priority: High โ
โ โ
โ ๐ก Relevant Learnings (3): โ
โ โข Always validate JWT signature โ
โ โข Use short token expiry (15min) โ
โ โข Implement token refresh mechanism โ
โ โ
โ โ ๏ธ Quality Gates: โ
โ โ Tests passing (87% coverage) โ
โ โ Linting: 2 issues (click to fix) โ
โ โ
โ def generate_token(user_id): โ
โ """Generate JWT token""" โ
โ # ... code ... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Proposed Solution:
Implement inline editor annotations that display Solokit state contextually in the editor:
-
Work Item Status Annotations
- Show active work item in files being edited
- Display work item status, priority, progress
- Link to full work item spec
- Indicate if file is part of work item scope
-
Learning Snippets on Hover
- Detect relevant learnings based on file/function
- Show learning snippets on hover
- Link to full learning details
- "Learn more" expands learning context
-
Quality Gate Indicators
- Show quality gate status inline
- Highlight failing tests in test files
- Show linting errors with quick fixes
- Display coverage gaps
-
Dependency Warnings
- Warn if editing file that's part of blocked work item
- Show dependency chain
- Suggest unblocking actions
-
Session Context
- Show current session info
- Display session time/progress
- Link to session briefing
- Quick access to
/endor/validate
Implementation:
Requires Enhancement #33 (MCP Server) as foundation
MCP-based annotation provider:
# src/solokit/mcp/annotations.py
from pathlib import Path
from typing import List, Dict, Any
class AnnotationProvider:
"""Provide annotations for editor via MCP"""
def get_annotations_for_file(self, file_path: str) -> List[Dict[str, Any]]:
"""Get all annotations for a file"""
annotations = []
# Work item annotations
work_item_annotations = self.get_work_item_annotations(file_path)
annotations.extend(work_item_annotations)
# Learning annotations
learning_annotations = self.get_learning_annotations(file_path)
annotations.extend(learning_annotations)
# Quality gate annotations
quality_annotations = self.get_quality_gate_annotations(file_path)
annotations.extend(quality_annotations)
return annotations
def get_work_item_annotations(self, file_path: str) -> List[Dict[str, Any]]:
"""Get work item status annotations"""
from solokit.session.status import get_session_status
from solokit.work_items.repository import WorkItemRepository
# Check if there's an active session
status = get_session_status()
if not status.get("active_work_item"):
return []
work_item_id = status["active_work_item"]
repository = WorkItemRepository()
work_item = repository.get_work_item(work_item_id)
# Check if file is in work item's git branch commits
if not self.file_in_work_item_scope(file_path, work_item_id):
return []
return [{
"type": "info",
"position": {"line": 0, "character": 0},
"message": f"๐ Work Item: {work_item['title']}",
"details": {
"work_item_id": work_item_id,
"status": work_item["status"],
"priority": work_item["priority"],
"session": status.get("session_number")
},
"actions": [
{"label": "View Spec", "command": f"solokit:work-show {work_item_id}"},
{"label": "End Session", "command": "solokit:end"}
]
}]
def get_learning_annotations(self, file_path: str) -> List[Dict[str, Any]]:
"""Get relevant learning annotations"""
from solokit.session.briefing.learning_loader import get_relevant_learnings
# Analyze file to determine topic
topic = self.extract_topic_from_file(file_path)
if not topic:
return []
# Get relevant learnings
learnings = get_relevant_learnings(topic, limit=3)
if not learnings:
return []
# Create annotation
learning_texts = [
f"โข {learning['content'][:80]}..."
for learning in learnings[:3]
]
return [{
"type": "info",
"position": {"line": 0, "character": 0},
"message": f"๐ก Relevant Learnings ({len(learnings)}):",
"details": {
"learnings": learnings,
"topic": topic
},
"hover_content": "\n".join(learning_texts),
"actions": [
{"label": "View All", "command": f"solokit:learn-search {topic}"}
]
}]
def get_quality_gate_annotations(self, file_path: str) -> List[Dict[str, Any]]:
"""Get quality gate status annotations"""
from solokit.quality.gates import QualityGateRunner
runner = QualityGateRunner()
# Get latest quality gate results
results = runner.get_latest_results()
if not results:
return []
annotations = []
# Check if file has linting issues
if "linting" in results and not results["linting"]["passed"]:
linting_issues = self.get_linting_issues_for_file(file_path, results["linting"])
for issue in linting_issues:
annotations.append({
"type": "warning",
"position": {"line": issue["line"], "character": issue["column"]},
"message": f"โ ๏ธ Lint: {issue['message']}",
"details": issue,
"actions": [
{"label": "Fix", "command": f"solokit:validate --fix"}
] if issue.get("fixable") else []
})
# Check if file has test coverage gaps
if "tests" in results:
coverage_gaps = self.get_coverage_gaps_for_file(file_path, results["tests"])
if coverage_gaps:
annotations.append({
"type": "info",
"position": {"line": 0, "character": 0},
"message": f"๐ Coverage: {coverage_gaps['percentage']}% (target: {coverage_gaps['target']}%)",
"details": coverage_gaps,
"hover_content": f"Uncovered lines: {coverage_gaps['uncovered_lines']}"
})
return annotations
def extract_topic_from_file(self, file_path: str) -> str:
"""Extract main topic/keywords from file"""
path = Path(file_path)
# Use file name and path components as topic
parts = path.stem.split("_")
parent_parts = path.parent.name.split("_")
topic = " ".join(parts + parent_parts)
return topic
def file_in_work_item_scope(self, file_path: str, work_item_id: str) -> bool:
"""Check if file was modified in work item's branch"""
# Implementation: check git log for work item's branch
pass
# MCP tool for annotations
async def get_file_annotations(params: Dict[str, Any]) -> Dict[str, Any]:
"""MCP tool to get annotations for a file"""
file_path = params["file_path"]
provider = AnnotationProvider()
annotations = provider.get_annotations_for_file(file_path)
return {
"file_path": file_path,
"annotations": annotations,
"count": len(annotations)
}
Claude Code MCP Tool Registration:
# Add to src/solokit/mcp/server.py
self.server.add_tool(Tool(
name="sdd_file_annotations",
description="Get Solokit annotations for a file (work item status, learnings, quality gates)",
parameters={
"type": "object",
"properties": {
"file_path": {"type": "string", "required": True}
},
"required": ["file_path"]
},
handler=get_file_annotations
))
Files Affected:
New:
src/solokit/mcp/annotations.py(will be created) - Annotation providertests/unit/test_annotations.py(will be created) - Annotation tests
Modified:
src/solokit/mcp/server.py(will be created) - Add annotation MCP tooldocs/mcp/tools.md(will be created) - Document annotation tool
Benefits:
- Context awareness: See Solokit state without leaving editor
- Learning reminders: Relevant learnings shown in context
- Proactive quality: See issues as you code
- Reduced context switching: Less terminal usage
- Better focus: All info in one place
- Discoverability: Learn about Solokit features through annotations
- Productivity: Faster access to relevant information
Priority: Medium - Nice to have, enhances developer experience
Dependencies:
- Requires Enhancement #33 (MCP Server) - Foundation for annotations
- Requires Claude Code support for displaying annotations (may need API/protocol discussion with Anthropic)
Notes:
- Implementation depends on Claude Code's annotation/diagnostic API
- May need custom protocol if Claude Code doesn't have annotation support yet
- Could start with simpler "status bar" annotations before full inline support
- Performance: Annotations should be computed lazily and cached
Enhancement #28: Advanced Testing Types
Status: ๐ต IDENTIFIED
Problem:
Basic unit and integration tests don't catch all issues:
- Mutation testing: Tests may pass even if they don't catch bugs
- Contract testing: API changes break clients unexpectedly
- Accessibility testing: WCAG compliance not validated
- Visual regression: UI changes undetected
Example:
API change: Remove field "user.email"
โ Unit tests pass โ (don't test this field)
โ Integration tests pass โ (don't use this field)
โ Deploy
โ Mobile app breaks โ (depends on user.email)
Proposed Solution:
Implement advanced testing types that catch issues traditional tests miss:
-
Mutation Testing
- Inject bugs into code (mutants)
- Verify tests catch the bugs
- Mutation score = % mutants killed
- Tools: Stryker (JS/TS), mutmut (Python)
-
Contract Testing
- Define API contracts
- Test provider adheres to contract
- Test consumer expectations met
- Detect breaking changes early
- Tools: Pact, Spring Cloud Contract
-
Accessibility Testing
- Validate WCAG 2.1 AA compliance
- Test keyboard navigation
- Test screen reader compatibility
- Check color contrast
- Tools: axe-core, Pa11y, Lighthouse
-
Visual Regression Testing
- Capture screenshots of UI
- Compare against baseline
- Detect unintended visual changes
- Tools: Percy, Chromatic, BackstopJS
Implementation:
Mutation testing:
# src/solokit/testing/mutation_tester.py
class MutationTester:
def run_mutation_tests(self, test_suite):
# Run Stryker or mutmut
# Generate mutants
# Check if tests kill mutants
# Calculate mutation score
def check_mutation_score(self, score, threshold):
# Fail if score < threshold (e.g., 75%)
Contract testing:
# src/solokit/testing/contract_tester.py
class ContractTester:
def define_contract(self, api_spec):
# Create Pact contract from OpenAPI spec
def test_provider(self, contract):
# Verify API adheres to contract
def test_consumer(self, contract):
# Verify client expectations met
def detect_breaking_changes(self, old_contract, new_contract):
# Compare contracts, find breaking changes
Accessibility testing:
# src/solokit/testing/accessibility_tester.py
class AccessibilityTester:
def run_axe_audit(self, url):
# Run axe-core accessibility audit
# Return violations
def check_wcag_compliance(self, violations):
# Verify WCAG 2.1 AA compliance
# Fail if critical violations
Visual regression:
# src/solokit/testing/visual_tester.py
class VisualTester:
def capture_screenshots(self, urls):
# Capture screenshots of pages
def compare_with_baseline(self, screenshots):
# Compare with baseline images
# Detect differences
def update_baseline(self, screenshots):
# Update baseline on approval
Configuration:
// .session/config.json
"advanced_testing": {
"mutation_testing": {
"enabled": true,
"threshold": 75,
"framework": "stryker" // or "mutmut"
},
"contract_testing": {
"enabled": true,
"format": "pact",
"break_on_breaking_changes": true
},
"accessibility_testing": {
"enabled": true,
"standard": "WCAG21AA",
"fail_on_violations": true
},
"visual_regression": {
"enabled": true,
"threshold": 0.02 // 2% pixel difference
}
}
Files Affected:
New:
src/solokit/testing/mutation_tester.py- Mutation testing (will be created)src/solokit/testing/contract_tester.py- Contract testing (will be created)src/solokit/testing/accessibility_tester.py- Accessibility testing (will be created)src/solokit/testing/visual_tester.py- Visual regression testing (will be created)tests/contracts/- Contract definitions (will be created)tests/visual/baselines/- Visual baseline images (will be created)- Tests for advanced testing modules
Modified:
src/solokit/quality/gates.py- Add advanced testing gates.session/config.json- Advanced testing configuration- CI/CD workflows - Add advanced testing jobs
Benefits:
- Better test quality: Mutation testing ensures tests catch bugs
- API stability: Contract testing prevents breaking changes
- Accessibility compliance: Automated WCAG validation
- UI stability: Visual regression catches unintended changes
- Comprehensive coverage: All types of issues caught
Priority: Medium - Improves test effectiveness
Enhancement #29: Frontend Quality & Design System Compliance
Status: ๐ต IDENTIFIED
Problem:
Frontend code has unique quality concerns that general linting and testing don't address. Projects with design systems document standards but lack automated enforcement:
-
Design System Non-Compliance:
- Developers accidentally use hardcoded colors instead of design tokens
- Inconsistent spacing values (13px, 17px) instead of standard scale (8px, 16px, 24px)
- Custom font sizes instead of typography scale
- Direct HTML elements instead of design system components
- Design debt accumulates silently
-
Framework-Specific Issues:
- React hooks violations (dependencies, conditional usage)
- Vue composition API anti-patterns
- Next.js Image component not used (missing optimization)
- Framework best practices not enforced
-
Responsive Design Problems:
- Non-standard breakpoints used inconsistently
- Fixed widths without max-width
- Missing mobile-first CSS
- Inconsistent responsive patterns
-
Accessibility Gaps:
- Non-semantic HTML (divs instead of buttons, headings)
- Missing ARIA attributes
- Poor color contrast
- Keyboard navigation broken
- Current testing (#25) only catches regressions, not initial violations
-
Bundle Size Bloat:
- No monitoring of bundle size over time
- Large dependencies added without review
- Missing code-splitting for large components
- Performance budget violations
-
CSS Quality Issues:
- Overuse of !important
- High CSS specificity causing maintainability issues
- Duplicate selectors
- Inconsistent naming conventions
Proposed Solution:
Implement frontend-specific quality gates that enforce design system compliance, framework best practices, responsive design patterns, accessibility standards, and bundle size limits:
-
Design Token Compliance
- Parse design tokens from CSS/SCSS/JS/Tailwind config
- Detect hardcoded values in code
- Validate against approved token values
- Auto-fix where possible
-
Component Library Enforcement
- Detect direct HTML when component exists
- Validate component prop usage
- Detect deprecated components
- Suggest correct component variants
-
Framework Best Practices
- React: hooks rules, memo usage, key props
- Vue: composition API patterns, reactivity
- Next.js: Image, Link, font optimization
- Svelte: reactive statement patterns
-
Responsive Design Validation
- Enforce standard breakpoints only
- Validate mobile-first approach
- Detect problematic fixed widths
- CSS ordering validation
-
Accessibility Enforcement
- Semantic HTML requirements
- ARIA attribute validation
- Color contrast checking (build-time)
- Keyboard navigation testing
- Focus management validation
-
Bundle Size Monitoring
- Track bundle size over time
- Alert on size increases > threshold
- Identify large dependencies
- Enforce code-splitting requirements
-
CSS Quality Standards
- !important usage limits
- Naming convention enforcement
- Specificity limits
- Duplicate selector detection
Implementation:
Design token validation:
# src/solokit/quality/frontend/design_tokens.py
from typing import Dict, List, Any
import re
from pathlib import Path
import json
class DesignTokenValidator:
"""Validate frontend code against design tokens"""
def __init__(self, tokens_file: Path):
self.tokens = self._load_tokens(tokens_file)
self.violations = []
def _load_tokens(self, tokens_file: Path) -> Dict[str, Any]:
"""Load design tokens from JSON/JS file"""
if not tokens_file.exists():
return {}
content = tokens_file.read_text()
# Handle both JSON and JS exports
if tokens_file.suffix == ".json":
return json.loads(content)
elif tokens_file.suffix in [".js", ".ts"]:
# Extract tokens from JS/TS export
# This is simplified - production would use AST parsing
match = re.search(r'export\s+(?:default\s+)?({.*})', content, re.DOTALL)
if match:
return json.loads(match.group(1))
return {}
def validate_file(self, file_path: Path) -> List[Dict[str, Any]]:
"""Validate a single file for design token compliance"""
content = file_path.read_text()
violations = []
# Check for hardcoded colors
color_pattern = r'(?:color|background|border-color):\s*#([0-9a-fA-F]{3,6}|rgba?\([^)]+\))'
for match in re.finditer(color_pattern, content):
line_num = content[:match.start()].count('\n') + 1
violations.append({
"file": str(file_path),
"line": line_num,
"type": "hardcoded_color",
"value": match.group(0),
"message": f"Hardcoded color found. Use design token from colors.{self._suggest_color_token(match.group(1))}",
"severity": "error"
})
# Check for hardcoded spacing
spacing_pattern = r'(?:margin|padding|gap):\s*(\d+)px'
for match in re.finditer(spacing_pattern, content):
spacing_value = int(match.group(1))
if spacing_value not in self.tokens.get("spacing", {}).values():
line_num = content[:match.start()].count('\n') + 1
violations.append({
"file": str(file_path),
"line": line_num,
"type": "hardcoded_spacing",
"value": f"{spacing_value}px",
"message": f"Non-standard spacing. Use spacing scale: {list(self.tokens.get('spacing', {}).values())}",
"severity": "error"
})
# Check for hardcoded font sizes
font_pattern = r'font-size:\s*(\d+)px'
for match in re.finditer(font_pattern, content):
font_size = int(match.group(1))
if font_size not in self.tokens.get("typography", {}).get("sizes", {}).values():
line_num = content[:match.start()].count('\n') + 1
violations.append({
"file": str(file_path),
"line": line_num,
"type": "hardcoded_font_size",
"value": f"{font_size}px",
"message": f"Non-standard font size. Use typography scale: {list(self.tokens.get('typography', {}).get('sizes', {}).values())}",
"severity": "error"
})
return violations
def _suggest_color_token(self, color_value: str) -> str:
"""Suggest appropriate color token for a value"""
# Simplified - production would do color similarity matching
colors = self.tokens.get("colors", {})
if color_value.lower() in colors.values():
return next(k for k, v in colors.items() if v.lower() == color_value.lower())
return "primary|secondary|error|..."
class ComponentLibraryValidator:
"""Validate component library usage"""
def __init__(self, component_library: str):
self.component_library = component_library
self.violations = []
def validate_file(self, file_path: Path) -> List[Dict[str, Any]]:
"""Validate component library usage"""
content = file_path.read_text()
violations = []
# Detect raw HTML buttons instead of Button component
button_pattern = r'<button[^>]*>'
for match in re.finditer(button_pattern, content):
# Check if this is inside a custom Button component definition
if not self._is_in_component_definition(content, match.start(), "Button"):
line_num = content[:match.start()].count('\n') + 1
violations.append({
"file": str(file_path),
"line": line_num,
"type": "raw_html_element",
"element": "button",
"message": f"Use <Button> from {self.component_library} instead of raw <button>",
"severity": "error"
})
# Detect raw links instead of Link component
link_pattern = r'<a\s+href='
for match in re.finditer(link_pattern, content):
if not self._is_in_component_definition(content, match.start(), "Link"):
line_num = content[:match.start()].count('\n') + 1
violations.append({
"file": str(file_path),
"line": line_num,
"type": "raw_html_element",
"element": "a",
"message": f"Use <Link> from {self.component_library} for internal navigation",
"severity": "warning"
})
return violations
def _is_in_component_definition(self, content: str, position: int, component_name: str) -> bool:
"""Check if position is inside a component definition"""
# Simplified - production would use AST
before = content[:position]
return f"function {component_name}" in before or f"const {component_name} =" in before
Bundle size monitoring:
# src/solokit/quality/frontend/bundle_size.py
from typing import Dict, List, Any
from pathlib import Path
import json
import subprocess
class BundleSizeMonitor:
"""Monitor and enforce bundle size limits"""
def __init__(self, config: Dict[str, Any]):
self.max_size_mb = config.get("max_size_mb", 0.5)
self.max_increase_percent = config.get("max_increase_percent", 5)
self.history_file = Path(".session/bundle_size_history.json")
def check_bundle_size(self, build_dir: Path) -> Dict[str, Any]:
"""Check current bundle size against limits"""
current_sizes = self._get_bundle_sizes(build_dir)
history = self._load_history()
violations = []
for bundle_name, current_size in current_sizes.items():
# Check absolute size
size_mb = current_size / (1024 * 1024)
if size_mb > self.max_size_mb:
violations.append({
"bundle": bundle_name,
"type": "size_limit_exceeded",
"current_size_mb": size_mb,
"max_size_mb": self.max_size_mb,
"message": f"Bundle size ({size_mb:.2f}MB) exceeds limit ({self.max_size_mb}MB)",
"severity": "error"
})
# Check size increase
if bundle_name in history:
previous_size = history[bundle_name]
increase_percent = ((current_size - previous_size) / previous_size) * 100
if increase_percent > self.max_increase_percent:
violations.append({
"bundle": bundle_name,
"type": "size_increase_exceeded",
"increase_percent": increase_percent,
"max_increase_percent": self.max_increase_percent,
"previous_size_mb": previous_size / (1024 * 1024),
"current_size_mb": size_mb,
"message": f"Bundle size increased by {increase_percent:.1f}% (limit: {self.max_increase_percent}%)",
"severity": "warning"
})
# Update history
self._save_history(current_sizes)
return {
"violations": violations,
"current_sizes": current_sizes,
"analysis": self._analyze_bundle(build_dir)
}
def _get_bundle_sizes(self, build_dir: Path) -> Dict[str, int]:
"""Get sizes of all bundles"""
sizes = {}
# Find all JS files in build directory
for js_file in build_dir.rglob("*.js"):
if js_file.is_file():
sizes[js_file.name] = js_file.stat().st_size
return sizes
def _analyze_bundle(self, build_dir: Path) -> Dict[str, Any]:
"""Analyze bundle composition"""
# Run webpack-bundle-analyzer or similar
# This is simplified - production would integrate with bundler
return {
"largest_dependencies": [],
"duplicate_code": [],
"recommendations": []
}
def _load_history(self) -> Dict[str, int]:
"""Load bundle size history"""
if not self.history_file.exists():
return {}
return json.loads(self.history_file.read_text())
def _save_history(self, sizes: Dict[str, int]):
"""Save current sizes to history"""
self.history_file.parent.mkdir(parents=True, exist_ok=True)
self.history_file.write_text(json.dumps(sizes, indent=2))
Accessibility validation:
# src/solokit/quality/frontend/accessibility.py
from typing import List, Dict, Any
from pathlib import Path
import subprocess
class AccessibilityValidator:
"""Validate accessibility standards"""
def __init__(self, config: Dict[str, Any]):
self.wcag_level = config.get("wcag_level", "AA")
self.check_color_contrast = config.get("check_color_contrast", True)
def validate_semantic_html(self, file_path: Path) -> List[Dict[str, Any]]:
"""Validate semantic HTML usage"""
content = file_path.read_text()
violations = []
# Check for divs that should be buttons
div_click_pattern = r'<div[^>]*onClick'
for match in re.finditer(div_click_pattern, content):
line_num = content[:match.start()].count('\n') + 1
violations.append({
"file": str(file_path),
"line": line_num,
"type": "non_semantic_html",
"message": "Use <button> instead of <div onClick>. Divs are not keyboard accessible.",
"severity": "error",
"wcag": "WCAG 2.1.1 (Keyboard)"
})
# Check for missing alt text on images
img_pattern = r'<img(?![^>]*alt=)'
for match in re.finditer(img_pattern, content):
line_num = content[:match.start()].count('\n') + 1
violations.append({
"file": str(file_path),
"line": line_num,
"type": "missing_alt_text",
"message": "Image missing alt attribute",
"severity": "error",
"wcag": "WCAG 1.1.1 (Non-text Content)"
})
return violations
def run_axe_core(self, url: str) -> Dict[str, Any]:
"""Run axe-core accessibility tests"""
# Use playwright or similar to run axe-core
result = subprocess.run(
["npx", "pa11y", "--standard", f"WCAG2{self.wcag_level}", url],
capture_output=True,
text=True
)
return {
"passed": result.returncode == 0,
"violations": self._parse_pa11y_output(result.stdout)
}
def _parse_pa11y_output(self, output: str) -> List[Dict[str, Any]]:
"""Parse pa11y output into violations"""
# Simplified parser
violations = []
for line in output.split('\n'):
if 'Error:' in line:
violations.append({
"message": line.strip(),
"severity": "error"
})
return violations
Frontend quality gate integration:
# src/solokit/quality/gates.py (enhanced)
class FrontendQualityGate:
"""Frontend-specific quality gate"""
def __init__(self, config: Dict[str, Any]):
self.config = config.get("frontend", {})
self.design_system_config = self.config.get("design_system", {})
self.bundle_size_config = self.config.get("bundle_size", {})
self.accessibility_config = self.config.get("accessibility", {})
def validate(self) -> Dict[str, Any]:
"""Run all frontend quality checks"""
results = {
"passed": True,
"checks": {}
}
# Design token compliance
if self.design_system_config.get("enabled"):
token_validator = DesignTokenValidator(
Path(self.design_system_config["tokens_file"])
)
violations = []
for file in self._get_frontend_files():
violations.extend(token_validator.validate_file(file))
results["checks"]["design_tokens"] = {
"passed": len(violations) == 0,
"violations": violations
}
if violations:
results["passed"] = False
# Component library compliance
if self.design_system_config.get("component_library"):
component_validator = ComponentLibraryValidator(
self.design_system_config["component_library"]
)
violations = []
for file in self._get_frontend_files():
violations.extend(component_validator.validate_file(file))
results["checks"]["component_library"] = {
"passed": len(violations) == 0,
"violations": violations
}
if violations:
results["passed"] = False
# Bundle size
if self.bundle_size_config.get("enabled"):
bundle_monitor = BundleSizeMonitor(self.bundle_size_config)
build_dir = Path(self.bundle_size_config.get("build_dir", "build"))
bundle_result = bundle_monitor.check_bundle_size(build_dir)
results["checks"]["bundle_size"] = bundle_result
if bundle_result["violations"]:
results["passed"] = False
# Accessibility
if self.accessibility_config.get("enabled"):
a11y_validator = AccessibilityValidator(self.accessibility_config)
violations = []
for file in self._get_frontend_files():
violations.extend(a11y_validator.validate_semantic_html(file))
results["checks"]["accessibility"] = {
"passed": len(violations) == 0,
"violations": violations
}
if violations:
results["passed"] = False
# Framework-specific linting (run via ESLint plugins)
results["checks"]["framework_linting"] = self._run_framework_linting()
if not results["checks"]["framework_linting"]["passed"]:
results["passed"] = False
return results
def _get_frontend_files(self) -> List[Path]:
"""Get all frontend source files"""
extensions = [".jsx", ".tsx", ".vue", ".svelte", ".css", ".scss"]
files = []
src_dir = Path("src")
if src_dir.exists():
for ext in extensions:
files.extend(src_dir.rglob(f"*{ext}"))
return files
def _run_framework_linting(self) -> Dict[str, Any]:
"""Run framework-specific ESLint rules"""
result = subprocess.run(
["npx", "eslint", "src/", "--format", "json"],
capture_output=True,
text=True
)
try:
lint_results = json.loads(result.stdout)
error_count = sum(r["errorCount"] for r in lint_results)
return {
"passed": error_count == 0,
"results": lint_results
}
except:
return {"passed": True, "results": []}
Configuration:
// .session/config.json (enhanced)
{
"quality_gates": {
"frontend": {
"enabled": true,
"design_system": {
"enabled": true,
"tokens_file": "src/design-tokens.json",
"strict_mode": true,
"allowed_exceptions": ["src/legacy/**"],
"component_library": "@company/design-system"
},
"component_library": {
"enabled": true,
"library": "@company/design-system",
"enforce_usage": true
},
"bundle_size": {
"enabled": true,
"max_size_mb": 0.5,
"max_increase_percent": 5,
"build_dir": "build"
},
"responsive": {
"enabled": true,
"breakpoints": ["640px", "768px", "1024px", "1280px"],
"mobile_first": true
},
"accessibility": {
"enabled": true,
"wcag_level": "AA",
"check_color_contrast": true,
"semantic_html_required": true
},
"framework_linting": {
"enabled": true,
"framework": "react",
"rules": {
"react-hooks": "error",
"jsx-a11y": "error"
}
}
}
}
}
ESLint configuration (.eslintrc.json):
{
"extends": [
"react-app",
"plugin:jsx-a11y/recommended",
"plugin:react-hooks/recommended"
],
"plugins": ["jsx-a11y", "react-hooks"],
"rules": {
"react-hooks/rules-of-hooks": "error",
"react-hooks/exhaustive-deps": "warn",
"jsx-a11y/alt-text": "error",
"jsx-a11y/anchor-is-valid": "error",
"jsx-a11y/click-events-have-key-events": "error",
"jsx-a11y/no-static-element-interactions": "error"
}
}
StyleLint configuration (.stylelintrc.json):
{
"extends": ["stylelint-config-standard"],
"plugins": ["stylelint-use-design-tokens"],
"rules": {
"scale-unlimited/declaration-strict-value": [
["/color/", "fill", "stroke"],
{
"ignoreValues": ["transparent", "inherit", "currentColor"]
}
],
"declaration-no-important": true,
"selector-max-specificity": "0,4,0",
"max-nesting-depth": 3
}
}
Commands:
# Run frontend quality gates
/sk:validate --frontend
# Run specific frontend checks
/sk:frontend-check --design-tokens
/sk:frontend-check --bundle-size
/sk:frontend-check --accessibility
# Analyze bundle size
/sk:bundle-analyze
# Check design token compliance
/sk:design-tokens-check
# Auto-fix design token violations (where possible)
/sk:design-tokens-fix
Files Affected:
New:
src/solokit/quality/frontend/- New module (will be created)src/solokit/quality/frontend/__init__.py- Module init (will be created)src/solokit/quality/frontend/design_tokens.py- Design token validation (will be created)src/solokit/quality/frontend/component_library.py- Component library validation (will be created)src/solokit/quality/frontend/bundle_size.py- Bundle size monitoring (will be created)src/solokit/quality/frontend/accessibility.py- Accessibility validation (will be created)src/solokit/quality/frontend/responsive.py- Responsive design validation (will be created).session/bundle_size_history.json- Bundle size tracking (will be created)tests/unit/test_frontend_quality.py- Unit tests (will be created)tests/integration/test_frontend_gates.py- Integration tests (will be created).claude/commands/frontend-check.md- Frontend check command (will be created).claude/commands/bundle-analyze.md- Bundle analysis command (will be created).claude/commands/design-tokens-check.md- Design token check command (will be created)
Modified:
src/solokit/quality/gates.py- Add frontend quality gatesrc/solokit/templates/config.schema.json- Add frontend quality config schema.claude/commands/validate.md- Document frontend validationREADME.md- Document frontend quality gates
Benefits:
- Automated Design System Enforcement: No manual reviews needed for design token compliance
- Prevents Design Debt: Catch violations before they accumulate
- Framework Best Practices: Enforce React hooks rules, Next.js optimizations, etc.
- Accessibility Built-In: WCAG compliance validated automatically
- Bundle Size Control: Prevent performance regressions from bloat
- Consistent Frontend Code: Uniform patterns across the codebase
- Faster Reviews: Automated checks reduce manual review time
- Learning Tool: Developers learn design system through validation messages
- Responsive Design Consistency: Standardized breakpoints and patterns
- CSS Quality: Clean, maintainable stylesheets
Priority: Medium-High (High for design system projects)
Justification:
- Fills significant gap in frontend code quality
- Essential for projects with design systems
- Prevents technical debt accumulation
- Improves accessibility compliance
- Aligns with modern frontend development practices
Notes:
- Design token validation requires design tokens to be defined in a parseable format
- Component library validation requires consistent naming conventions
- Bundle size monitoring requires a build step
- Accessibility checks complement but don't replace manual testing
- Framework-specific rules depend on ESLint plugins being installed
- Can be disabled for projects without design systems
- Works well with Enhancement #25 (Advanced Testing Types) - visual regression testing
- Related to Enhancement #18 (Advanced Code Quality Gates) - extends code quality to frontend specifics
- Can integrate with Enhancement #38 (MCP Server Management) - playwright MCP for visual validation
- Template-based init (Enhancement #13) can include framework-specific frontend configurations
Enhancement #30: Documentation-Driven Development
Status: ๐ต IDENTIFIED
Problem:
The AI-Augmented Solo Framework assumes developers start with Vision, PRD, and Architecture documents, but Solokit currently has no workflow to:
- Parse project documentation: Vision, PRD, Architecture docs exist but aren't used
- Generate work items from docs: Manual work item creation from 100+ page docs is tedious
- Maintain doc-code traceability: No link between code and original requirements
- Track architecture decisions: ADRs not captured or tracked
- Validate against architecture: Work items may violate architecture constraints
Example workflow gap:
Developer has:
- Vision.md (product vision)
- PRD.md (requirements, 50 pages)
- Architecture.md (system design)
Current process:
โ Manually read all docs
โ Manually create work items
โ Hope work items align with architecture
โ No traceability between code and requirements
Proposed Solution:
Implement documentation-driven development workflow that parses project docs and guides development:
-
Document Parsing and Analysis
- Parse Vision, PRD, Architecture, ADR documents
- Extract requirements, user stories, architectural constraints
- Build knowledge graph of project structure
-
Smart Work Item Generation
- Analyze documents and suggest work items
- Prioritize based on dependencies and business value
- Map work items to architecture components
- Estimate complexity from requirements
-
Architecture Decision Records (ADRs)
- Template-based ADR creation
- Link ADRs to work items
- Track decision history and rationale
- Validate work items against ADRs
-
Document-to-Code Traceability
- Link work items to requirements in docs
- Track which code implements which requirement
- Generate traceability matrix
-
Architecture Validation
- Validate work items against architecture constraints
- Detect architecture violations
- Suggest architecture updates when needed
-
API-First Documentation System
- Automated OpenAPI/Swagger generation from code annotations
- Interactive API documentation (Swagger UI, Redoc, API Explorer)
- API versioning and changelog automation
- SDK generation for multiple languages (Python, TypeScript, Go, etc.)
- API contract testing integration
- Breaking change detection between API versions
- API usage analytics and deprecation management
Implementation:
Document parser:
# src/solokit/docs/parser.py
class DocumentParser:
def parse_vision(self, vision_file):
# Extract business goals, target users
def parse_prd(self, prd_file):
# Extract requirements, user stories, acceptance criteria
def parse_architecture(self, arch_file):
# Extract components, constraints, patterns
def parse_adrs(self, adr_dir):
# Load all ADRs, build decision history
Work item generator:
# src/solokit/work_items/generator.py
class WorkItemGenerator:
def suggest_from_documents(self, docs):
# Analyze docs, extract requirements
# Generate work item suggestions
# Prioritize and estimate
def map_to_architecture(self, work_items, architecture):
# Map work items to arch components
# Validate against constraints
API documentation generator:
# src/solokit/docs/api_doc_generator.py
class APIDocumentationGenerator:
def generate_openapi_spec(self, codebase):
# Scan code for API endpoints and annotations
# Generate OpenAPI 3.0 specification
# Include schemas, parameters, responses
def generate_interactive_docs(self, openapi_spec):
# Generate Swagger UI / Redoc documentation
# Set up API explorer with try-it-out functionality
# Deploy to docs site
def generate_sdk(self, openapi_spec, languages):
# Generate client SDKs from OpenAPI spec
# Support Python, TypeScript, Go, Java, etc.
# Include usage examples and tests
def detect_breaking_changes(self, old_spec, new_spec):
# Compare API versions
# Identify breaking changes (removed endpoints, changed schemas)
# Generate migration guide
def track_api_versions(self):
# Maintain API version history
# Generate changelogs automatically
# Mark deprecated endpoints
API documentation example:
# Generated OpenAPI specification
openapi: 3.0.0
info:
title: User Management API
version: 2.1.0
description: API for user authentication and profile management
paths:
/api/v2/users:
get:
summary: List all users
parameters:
- name: limit
in: query
schema:
type: integer
default: 10
responses:
'200':
description: List of users
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/User'
post:
summary: Create a new user
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/UserCreate'
responses:
'201':
description: User created successfully
components:
schemas:
User:
type: object
properties:
id:
type: string
email:
type: string
name:
type: string
Commands:
# Parse docs and suggest work items
/sk:work-suggest --from-docs
# Create ADR for architectural decision
/sk:adr-new --title "Use PostgreSQL for primary database"
# Validate work item against architecture
/sk:work-validate <work-item-id> --architecture
# Generate traceability matrix
/sk:trace --requirements docs/PRD.md
# Generate API documentation
/sk:api-docs-generate [--output swagger|redoc|both]
# Generate SDK from API spec
/sk:api-sdk-generate --language [python|typescript|go|java]
# Check for breaking API changes
/sk:api-breaking-changes --compare v1.0.0..v2.0.0
ADR template:
# ADR-NNN: [Decision Title]
**Status:** Proposed | Accepted | Deprecated | Superseded
**Context:**
Why is this decision needed?
**Decision:**
What did we decide?
**Alternatives Considered:**
1. Option A - [pros/cons]
2. Option B - [pros/cons]
**Consequences:**
- Positive: [benefits]
- Negative: [trade-offs]
**Related Work Items:**
- feature_xxx
- bug_yyy
**References:**
- [External resources]
Files Affected:
New:
src/solokit/docs/parser.py(will be created) - Document parsingsrc/solokit/docs/vision_parser.py(will be created) - Vision document parsersrc/solokit/docs/prd_parser.py(will be created) - PRD parsersrc/solokit/docs/architecture_parser.py(will be created) - Architecture parsersrc/solokit/docs/api_doc_generator.py(will be created) - API documentation generatorsrc/solokit/work_items/generator.py(will be created) - Work item generatorsrc/solokit/architecture/adr_manager.py(will be created) - ADR managementsrc/solokit/architecture/validator.py(will be created) - Architecture validationsrc/solokit/traceability/tracker.py(will be created) - Requirement traceabilitysrc/solokit/api/openapi_generator.py(will be created) - OpenAPI specification generatorsrc/solokit/api/sdk_generator.py(will be created) - Multi-language SDK generatorsrc/solokit/api/breaking_change_detector.py(will be created) - API version comparator.claude/commands/work-suggest.md(will be created) - Work suggestion command.claude/commands/adr-new.md(will be created) - ADR creation command.claude/commands/api-docs-generate.md(will be created) - API docs generation command.claude/commands/api-sdk-generate.md(will be created) - SDK generation command.claude/commands/api-breaking-changes.md(will be created) - Breaking change detection commanddocs/adr/(will be created) - ADR directorydocs/api/(will be created) - Generated API documentation- Tests for document parsing and generation (will be created)
Modified:
src/solokit/work_items/creator.py- Support generated work itemssrc/solokit/work_items/spec_parser.py- Parse architecture constraints.session/tracking/work_items.json- Add traceability fields
Benefits:
- Faster planning: Auto-generate work items from docs
- Alignment: Work items guaranteed to match requirements
- Traceability: Know which code implements which requirement
- Architecture compliance: Work validated against architecture
- Decision history: ADRs track why decisions were made
- Knowledge capture: Documentation drives development
- API-first development: Automated API documentation from code
- Multi-language SDKs: Auto-generated client libraries
- API stability: Breaking change detection prevents client disruption
- Developer experience: Interactive API documentation and examples
Priority: High - Bridges gap between planning and implementation
Notes:
- Requires project documentation to exist (Vision, PRD, Architecture)
- Parser supports Markdown and common doc formats
- AI can assist with initial document creation if needed
Enhancement #31: AI-Enhanced Learning System
Status: ๐ต IDENTIFIED
Problem:
Current learning system uses keyword-based algorithms with limitations:
-
Learning Curation (Deduplication):
- Uses Jaccard similarity for duplicate detection
- Misses semantically similar learnings with different wording
- Example: "Use async/await for better performance" vs "Prefer promises over callbacks" are similar but Jaccard doesn't detect
-
Learning Relevance Scoring:
- Uses keyword matching to find relevant learnings
- Misses semantically related learnings
- Example: Work item "Implement JWT authentication" โ Learning "Always validate tokens on server side" scores low (no "JWT" keyword) but is highly relevant
-
Learning Categorization:
- Keyword-based category assignment
- May miscategorize learnings with ambiguous keywords
- Example: "Cache invalidation is hard" โ Could be "performance" or "architecture" or "gotchas"
Proposed Solution:
Implement AI-powered learning system using Claude API for semantic understanding:
-
AI-Powered Deduplication
- Use Claude API to detect semantically similar learnings
- Understand meaning, not just word overlap
- Smarter merging of similar learnings
- Preserve unique insights
-
Semantic Relevance Scoring
- Use Claude API to score learning relevance to work items
- Understand context and semantic relationships
- Find relevant learnings even without keyword matches
- Context-aware recommendations
-
Intelligent Categorization
- Use Claude API to categorize learnings
- Understand nuance and context
- Multi-category support (learning can fit multiple categories)
- Confidence scores for categories
-
Learning Summarization
- Generate summaries of long learnings
- Extract key insights
- Create learning digests
-
Learning Relationships
- Detect relationships between learnings
- Build knowledge graph
- Suggest related learnings
Implementation:
AI-powered learning curator:
# src/solokit/learning/ai_curator.py
import anthropic
from typing import List, Dict, Any, Tuple
import json
class AILearningCurator:
"""AI-powered learning curation using Claude API"""
def __init__(self, api_key: str = None):
self.client = anthropic.Anthropic(api_key=api_key or os.environ.get("ANTHROPIC_API_KEY"))
self.model = "claude-sonnet-4-5-20250929"
def detect_semantic_similarity(
self,
learning1: Dict[str, Any],
learning2: Dict[str, Any]
) -> Tuple[bool, float, str]:
"""
Detect if two learnings are semantically similar.
Returns:
(is_similar, similarity_score, reasoning)
"""
prompt = f"""Analyze if these two learnings are semantically similar:
Learning 1: {learning1['content']}
Category 1: {learning1.get('category', 'unknown')}
Learning 2: {learning2['content']}
Category 2: {learning2.get('category', 'unknown')}
Respond in JSON format:
{{
"similar": true/false,
"similarity_score": 0.0-1.0,
"reasoning": "brief explanation",
"recommendation": "keep_both" | "merge" | "mark_as_related"
}}
Consider:
- Do they convey the same core insight?
- Are they about the same problem/solution?
- Would a developer benefit from seeing both separately?
"""
response = self.client.messages.create(
model=self.model,
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
result = json.loads(response.content[0].text)
return result["similar"], result["similarity_score"], result["reasoning"]
def score_learning_relevance(
self,
learning: Dict[str, Any],
work_item_title: str,
work_item_spec: str,
work_item_type: str
) -> Tuple[float, str]:
"""
Score how relevant a learning is to a work item.
Returns:
(relevance_score, reasoning)
"""
prompt = f"""Rate how relevant this learning is to the work item (0.0-1.0):
Work Item:
- Title: {work_item_title}
- Type: {work_item_type}
- Spec: {work_item_spec[:500]}...
Learning: {learning['content']}
Category: {learning.get('category', 'unknown')}
Respond in JSON format:
{{
"relevance_score": 0.0-1.0,
"reasoning": "brief explanation of why/why not relevant",
"key_connections": ["connection 1", "connection 2"]
}}
Consider:
- Does it help solve the work item's problem?
- Does it prevent common mistakes in this type of work?
- Is it about related technologies/patterns?
- Would a developer benefit from knowing this?
"""
response = self.client.messages.create(
model=self.model,
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
result = json.loads(response.content[0].text)
return result["relevance_score"], result["reasoning"]
def categorize_learning(
self,
learning_content: str
) -> List[Tuple[str, float]]:
"""
Categorize a learning using AI.
Returns:
List of (category, confidence) tuples
"""
categories = [
"architecture", "gotchas", "best_practices",
"technical_debt", "performance", "security"
]
prompt = f"""Categorize this learning. It may fit multiple categories.
Learning: {learning_content}
Available categories:
- architecture: Architectural patterns and design decisions
- gotchas: Pitfalls, traps, common mistakes
- best_practices: Conventions, standards, recommendations
- technical_debt: Refactoring needs, workarounds, TODOs
- performance: Optimization insights, benchmarks
- security: Security considerations and hardening
Respond in JSON format:
{{
"categories": [
{{"name": "category", "confidence": 0.0-1.0, "reasoning": "brief explanation"}},
...
],
"primary_category": "category"
}}
"""
response = self.client.messages.create(
model=self.model,
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
result = json.loads(response.content[0].text)
return [
(cat["name"], cat["confidence"])
for cat in result["categories"]
]
def summarize_learning(
self,
learning_content: str,
max_length: int = 80
) -> str:
"""Generate a concise summary of a learning"""
prompt = f"""Summarize this learning in {max_length} characters or less:
Learning: {learning_content}
Provide a concise summary that captures the key insight.
"""
response = self.client.messages.create(
model=self.model,
max_tokens=100,
messages=[{"role": "user", "content": prompt}]
)
summary = response.content[0].text.strip()
return summary[:max_length]
def suggest_merge(
self,
learnings: List[Dict[str, Any]]
) -> str:
"""Suggest how to merge similar learnings"""
prompt = f"""These learnings are similar. Suggest how to merge them into one comprehensive learning:
Learnings:
{json.dumps([l['content'] for l in learnings], indent=2)}
Provide a merged learning that:
1. Captures all unique insights
2. Is concise and clear
3. Preserves important details
4. Uses consistent terminology
"""
response = self.client.messages.create(
model=self.model,
max_tokens=500,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text.strip()
def find_related_learnings(
self,
learning: Dict[str, Any],
all_learnings: List[Dict[str, Any]],
limit: int = 5
) -> List[Tuple[Dict, float, str]]:
"""Find learnings related to this one"""
prompt = f"""Find learnings related to this one:
Main Learning: {learning['content']}
Other Learnings:
{json.dumps([{"id": i, "content": l['content']} for i, l in enumerate(all_learnings[:20])], indent=2)}
Respond in JSON format:
{{
"related": [
{{
"id": learning_id,
"relatedness": 0.0-1.0,
"relationship": "brief description of how they relate"
}},
...
]
}}
"""
response = self.client.messages.create(
model=self.model,
max_tokens=800,
messages=[{"role": "user", "content": prompt}]
)
result = json.loads(response.content[0].text)
related = [
(all_learnings[r["id"]], r["relatedness"], r["relationship"])
for r in result["related"]
]
return sorted(related, key=lambda x: x[1], reverse=True)[:limit]
Enhanced learning curator:
# src/solokit/learning/curator.py (enhanced)
class LearningCurator:
def __init__(self):
self.learnings_file = Path(".session/tracking/learnings.json")
self.ai_curator = AILearningCurator() if self.has_api_key() else None
def has_api_key(self) -> bool:
"""Check if Anthropic API key is available"""
return bool(os.environ.get("ANTHROPIC_API_KEY"))
def curate_learnings(self, use_ai: bool = True):
"""Curate learnings with optional AI enhancement"""
learnings = self.load_learnings()
if use_ai and self.ai_curator:
print("Using AI-powered curation...")
self.ai_curate_learnings(learnings)
else:
print("Using keyword-based curation...")
self.keyword_curate_learnings(learnings)
def ai_curate_learnings(self, learnings: List[Dict]):
"""AI-powered curation"""
# 1. Categorize uncategorized learnings
for learning in learnings:
if not learning.get("category"):
categories = self.ai_curator.categorize_learning(learning["content"])
learning["category"] = categories[0][0] # Primary category
learning["categories_all"] = categories # All categories with confidence
# 2. Find and merge similar learnings
merged_count = 0
i = 0
while i < len(learnings):
j = i + 1
while j < len(learnings):
similar, score, reasoning = self.ai_curator.detect_semantic_similarity(
learnings[i], learnings[j]
)
if similar and score > 0.8:
# Merge learnings
merged_content = self.ai_curator.suggest_merge([learnings[i], learnings[j]])
learnings[i]["content"] = merged_content
learnings[i]["merged_from"] = learnings[i].get("merged_from", []) + [learnings[j]["id"]]
learnings.pop(j)
merged_count += 1
print(f"Merged similar learnings: {reasoning}")
else:
j += 1
i += 1
print(f"Merged {merged_count} similar learnings")
# 3. Find learning relationships
for i, learning in enumerate(learnings):
related = self.ai_curator.find_related_learnings(
learning, learnings[:i] + learnings[i+1:], limit=3
)
learning["related_learnings"] = [
{"id": r[0]["id"], "relationship": r[2]}
for r in related
]
self.save_learnings(learnings)
def semantic_search(
self,
query: str,
limit: int = 10,
category: str = None
) -> List[Dict]:
"""Semantic search using AI"""
learnings = self.load_learnings()
if category:
learnings = [l for l in learnings if l.get("category") == category]
# Use AI to score relevance
scored_learnings = []
for learning in learnings:
score, reasoning = self.ai_curator.score_learning_relevance(
learning,
work_item_title=query,
work_item_spec=query,
work_item_type="feature"
)
scored_learnings.append((learning, score, reasoning))
# Sort by relevance
scored_learnings.sort(key=lambda x: x[1], reverse=True)
return [
{**l[0], "relevance_score": l[1], "relevance_reasoning": l[2]}
for l in scored_learnings[:limit]
]
Enhanced session briefing:
# src/solokit/session/briefing/learning_loader.py (enhanced)
def get_relevant_learnings_ai(work_item_id: str, limit: int = 10) -> List[Dict]:
"""Get relevant learnings using AI-powered scoring"""
from solokit.work_items.repository import WorkItemRepository
from solokit.learning.repository import LearningRepository
# Get work item
repository = WorkItemRepository()
work_item = repository.get_work_item(work_item_id)
# Get spec
spec_path = Path(f".session/specs/{work_item_id}.md")
spec_content = spec_path.read_text() if spec_path.exists() else ""
# Get all learnings
learning_repo = LearningRepository()
learnings = learning_repo.load_learnings()
if learning_repo.ai_curator:
# Use AI scoring
scored_learnings = []
for learning in learnings:
score, reasoning = learning_repo.ai_curator.score_learning_relevance(
learning,
work_item_title=work_item["title"],
work_item_spec=spec_content,
work_item_type=work_item["type"]
)
scored_learnings.append((learning, score, reasoning))
# Sort by relevance
scored_learnings.sort(key=lambda x: x[1], reverse=True)
return [
{**l[0], "relevance_score": l[1], "relevance_reasoning": l[2]}
for l in scored_learnings[:limit]
]
else:
# Fallback to keyword-based scoring
return get_relevant_learnings(work_item_id, limit)
Configuration:
// .session/config.json
{
"learning_system": {
"use_ai_curation": true,
"use_ai_relevance": true,
"ai_curation_frequency": "weekly", // or "every_n_sessions": 5
"semantic_search_enabled": true,
"min_similarity_threshold": 0.8,
"api_provider": "anthropic",
"model": "claude-sonnet-4-5-20250929"
}
}
Commands:
# Use AI curation
/sk:learn-curate --ai
# Semantic search
/sk:learn-search "authentication" --semantic
# Find related learnings
/sk:learn-related <learning_id>
Files Affected:
New:
src/solokit/learning/ai_curator.py(will be created) - AI-powered curationtests/unit/test_ai_curator.py(will be created) - AI curator teststests/fixtures/sample_learnings.json(will be created) - Test learnings
Modified:
src/solokit/learning/curator.py- Integrate AI curationsrc/solokit/learning/repository.py- Integrate AI searchsrc/solokit/session/briefing/learning_loader.py- Use AI relevance scoring.session/config.json- Add AI learning configurationpyproject.toml- Add anthropic SDK dependency.claude/commands/learn-curate.md- Document AI curation.claude/commands/learn-search.md- Document semantic search
Benefits:
- Better deduplication: Catches semantically similar learnings
- Smarter relevance: Finds relevant learnings without keyword matches
- Improved categorization: Understands nuance and context
- Knowledge graph: Relationships between learnings
- Summarization: Concise summaries for quick scanning
- Higher quality: Cleaner, more useful knowledge base
- Better context loading: More relevant learnings in session briefings
- Learning evolution: Merge and refine learnings over time
Priority: High - Enhances core Solokit feature (learning system)
Notes:
- Requires Anthropic API key (set via ANTHROPIC_API_KEY env variable)
- Graceful fallback to keyword-based methods if API key not available
- API costs should be monitored (curation is infrequent, so cost is low)
- Can be disabled per project if API access not desired
- Considers privacy: learnings stay local, only sent to API during curation
Enhancement #32: Continuous Improvement System
Status: ๐ต IDENTIFIED
Problem:
Development processes don't improve over time. No mechanism to:
- Learn from work items: Patterns and lessons lost
- Track technical debt: Debt accumulates unnoticed
- Measure velocity: Don't know if getting faster or slower
- Identify bottlenecks: Process inefficiencies unknown
- Optimize workflows: No data-driven improvements
Example:
Work item completed โ Next work item started
โ No reflection on what worked/didn't work
โ Same issues repeat
โ No improvement
Proposed Solution:
Implement continuous improvement system that tracks metrics and suggests optimizations:
-
Automated Retrospectives
- After each work item or milestone, generate retrospective
- Analyze what went well, what didn't
- Track lessons learned
- Suggest improvements
-
Technical Debt Tracking
- Identify technical debt during development
- Track debt accumulation over time
- Prioritize debt paydown
- Measure debt ratio
-
DORA Metrics Dashboard
- Deployment frequency: How often deploying
- Lead time: Time from commit to production
- Change failure rate: % of deployments that fail
- Mean time to recovery (MTTR): Time to fix production issues
-
Velocity and Cycle Time Tracking
- Track work item completion time
- Measure velocity (story points/week)
- Identify slowdowns
- Trend analysis
-
Process Optimization Recommendations
- Analyze bottlenecks in workflow
- Suggest process improvements
- A/B test process changes
- Measure impact of improvements
Implementation:
Retrospective generator:
# src/solokit/improvement/retrospective.py
class RetrospectiveGenerator:
def generate_retrospective(self, work_item):
# Analyze work item history
# Generate retrospective questions
# Track patterns
def suggest_improvements(self, retrospectives):
# Analyze multiple retrospectives
# Identify recurring issues
# Suggest improvements
Technical debt tracker:
# src/solokit/improvement/debt_tracker.py
class TechnicalDebtTracker:
def identify_debt(self, codebase):
# Detect code smells
# Find TODOs and FIXMEs
# Measure code complexity
def calculate_debt_ratio(self):
# Debt ratio = debt / total code
# Track over time
DORA metrics:
# src/solokit/improvement/dora_metrics.py
class DORAMetrics:
def deployment_frequency(self):
# Count deployments per day/week
def lead_time(self):
# Time from commit to production
def change_failure_rate(self):
# Failed deployments / total deployments
def mean_time_to_recovery(self):
# Average time to fix production issues
Dashboard:
# /sk:status --project
## DORA Metrics
- Deployment Frequency: 3.2/week (โ from 2.5)
- Lead Time: 2.3 days (โ from 3.1 days)
- Change Failure Rate: 8% (target: <15%)
- MTTR: 1.2 hours (โ from 2.5 hours)
## Velocity
- Current: 21 story points/week
- Trend: โ 15% over last month
- Average cycle time: 2.1 days
## Technical Debt
- Debt Ratio: 12% (target: <15%)
- High-priority debt items: 3
- Debt added this week: 2 items
- Debt resolved this week: 4 items
## Process Insights
- Bottleneck: Integration testing (avg 45 min)
- Suggestion: Parallelize integration tests
- Improvement opportunity: Automate deployment rollback
Retrospective format:
# Retrospective: feature_user_authentication
**What Went Well:**
- TDD approach caught edge cases early
- Performance testing revealed bottleneck before production
- Documentation was comprehensive
**What Didn't Go Well:**
- Integration tests took 45 minutes (too slow)
- Had to refactor authentication logic twice
- Missing error handling for edge case
**Lessons Learned:**
- Always consider rate limiting from the start
- Test with realistic data volumes
**Action Items:**
- [ ] Speed up integration tests (parallelize)
- [ ] Add rate limiting to API design checklist
- [ ] Create authentication patterns library
**Metrics:**
- Cycle time: 3.2 days
- Test coverage: 92%
- Refactoring events: 2
Files Affected:
New:
src/solokit/improvement/retrospective.py(will be created) - Retrospective generationsrc/solokit/improvement/debt_tracker.py(will be created) - Technical debt trackingsrc/solokit/improvement/dora_metrics.py(will be created) - DORA metrics calculationsrc/solokit/improvement/velocity_tracker.py(will be created) - Velocity trackingsrc/solokit/improvement/bottleneck_analyzer.py(will be created) - Bottleneck detection.session/tracking/retrospectives/(will be created) - Retrospective storage.session/tracking/metrics.json(will be created) - Metrics history- Tests for improvement modules (will be created)
Modified:
src/solokit/session/complete.py- Generate retrospective on work item completion.claude/commands/status.md- Add project-level status command.session/tracking/work_items.json- Add cycle time tracking
Benefits:
- Continuous learning: Learn from every work item
- Debt management: Technical debt tracked and managed
- Velocity visibility: Know if improving or slowing down
- Data-driven decisions: Optimize based on metrics
- Process improvement: Systematically improve workflow
- Team-level insights: Solo developer with team-level metrics
Priority: Medium - Important for long-term productivity
Enhancement #33: Performance Testing Framework
Status: ๐ต IDENTIFIED
Problem:
Performance issues are discovered in production, not development:
- No performance baselines: Don't know expected performance
- No load testing: System untested under realistic load
- No regression detection: Performance degradations unnoticed
- No bottleneck identification: Slow endpoints unknown
Example:
Feature added โ All tests pass โ โ Deploy
โ Production: 5s response times โ
โ Users complain
โ No baseline to compare
Proposed Solution:
Implement comprehensive performance testing framework:
-
Performance Benchmarks in Specs
- Define performance requirements in work items
- Example: "API must respond in <200ms at p95"
- Enforce benchmarks before merge
-
Automated Load Testing
- Run load tests in CI/CD
- Tools: k6, wrk, Gatling, Locust
- Test realistic traffic patterns
-
Performance Regression Detection
- Compare results against baseline
- Fail if performance degrades >10%
- Track performance over time
-
Bottleneck Identification
- Profile slow endpoints
- Identify database query issues
- Find N+1 queries, missing indexes
-
Performance Baseline Tracking
- Store baselines in
.session/tracking/performance_baselines.json - Update baselines when performance improves
- Historical performance charts
- Store baselines in
Implementation:
Performance spec in work item:
## Performance Requirements
**Response Time Targets:**
- GET /api/users: <100ms (p50), <200ms (p95)
- POST /api/orders: <500ms (p50), <1s (p95)
- Database queries: <50ms average
**Throughput Targets:**
- 1000 requests/second sustained
- 5000 concurrent users
**Resource Limits:**
- Memory: <512MB
- CPU: <50% average
Load testing:
# src/solokit/testing/load_tester.py (will be created)
class LoadTester:
def run_load_test(self, work_item):
# Extract performance requirements
# Run k6/wrk load test
# Compare against baseline
# Return pass/fail + metrics
def detect_regression(self, current, baseline):
# Compare metrics
# Fail if >10% slower
k6 test generation:
// tests/performance/api_test.js (auto-generated)
import http from 'k6/http';
import { check } from 'k6';
export let options = {
stages: [
{ duration: '2m', target: 100 }, // Ramp to 100 users
{ duration: '5m', target: 100 }, // Stay at 100
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
'http_req_duration': ['p(95)<200'], // 95% requests <200ms
},
};
export default function() {
let res = http.get('http://localhost:3000/api/users');
check(res, {
'status is 200': (r) => r.status === 200,
'response time OK': (r) => r.timings.duration < 200,
});
}
Baseline tracking:
// .session/tracking/performance_baselines.json
{
"endpoints": {
"/api/users": {
"p50": 85,
"p95": 180,
"last_updated": "2025-10-29",
"session": "session_015"
}
}
}
Files Affected:
New:
src/solokit/testing/load_tester.py(will be created) - Load testing orchestrationsrc/solokit/testing/baseline_manager.py(will be created) - Baseline trackingsrc/solokit/testing/regression_detector.py(will be created) - Regression detectionsrc/solokit/testing/profiler.py(will be created) - Performance profilingtests/performance/(will be created) - Generated load tests.session/tracking/performance_baselines.json(will be created) - Baseline storage- Tests for performance framework (will be created)
Modified:
src/solokit/quality/gates.py- Add performance gatessrc/solokit/work_items/spec_parser.py- Parse performance requirements.session/config.json- Performance testing configuration- CI/CD workflows - Add performance testing job
Benefits:
- Prevent regressions: Catch slowdowns before production
- Meet SLAs: Enforce performance requirements
- Capacity planning: Know system limits
- Bottleneck identification: Find and fix slow code
- Performance visibility: Track performance over time
Priority: High - Performance issues cause production incidents
Enhancement #34: Operations & Observability
Status: ๐ต IDENTIFIED
Problem:
After deployment, there's no operational support infrastructure:
- No health monitoring: Can't tell if service is healthy
- No incident detection: Issues discovered by users, not monitoring
- No performance dashboards: Can't see system performance
- No capacity planning: Don't know when to scale
- No alert management: Alerts missing or too noisy
Example:
Deploy to production โ โ Service running
โ Database runs out of connections โ
โ No alert
โ Users report errors
โ 2 hours to discover issue
Proposed Solution:
Implement comprehensive operations and observability infrastructure:
-
Health Check Monitoring
- Monitor
/healthendpoint continuously - Alert on failures
- Track uptime metrics
- Integration with UptimeRobot, Pingdom, Datadog
- Monitor
-
Incident Detection and Response
- Automatic incident creation on alerts
- Incident runbooks linked to alerts
- PagerDuty/Opsgenie integration
- Incident timeline and resolution tracking
-
Performance Metrics Dashboards
- Real-time metrics visualization
- Request rates, latency, error rates
- Database performance metrics
- Infrastructure metrics (CPU, memory, disk)
- Tools: Grafana, Datadog, New Relic
-
Capacity Planning
- Track resource usage trends
- Predict when scaling needed
- Cost optimization recommendations
- Alert on approaching limits
-
Intelligent Alerting
- Reduce alert noise (no alert fatigue)
- Alert prioritization (critical vs warning)
- Alert aggregation and correlation
- Alert routing and escalation
Implementation:
Health monitoring:
# src/solokit/operations/health_monitor.py
class HealthMonitor:
def setup_monitoring(self, endpoints):
# Configure health check monitoring
# Set up alerts
def check_health(self):
# Poll health endpoints
# Detect failures
# Create incidents
Incident management:
# src/solokit/operations/incident_manager.py
class IncidentManager:
def create_incident(self, alert):
# Create incident from alert
# Link to runbook
# Notify on-call
def track_incident(self, incident_id):
# Track resolution steps
# Update timeline
Dashboards:
# monitoring/dashboards/api_dashboard.yml
dashboard:
title: "API Performance"
panels:
- title: "Request Rate"
metric: "http_requests_total"
- title: "Response Time (p95)"
metric: "http_request_duration_p95"
- title: "Error Rate"
metric: "http_errors_total / http_requests_total"
- title: "Database Connections"
metric: "db_connections_active"
Alert configuration:
# monitoring/alerts/api_alerts.yml
alerts:
- name: "High Error Rate"
condition: "error_rate > 5%"
severity: "critical"
notify: ["email", "pagerduty"]
- name: "Slow Response Time"
condition: "p95_latency > 1s"
severity: "warning"
notify: ["email"]
- name: "Database Connection Pool Exhausted"
condition: "db_connections > 90%"
severity: "critical"
runbook: "docs/runbooks/db_connections.md"
Files Affected:
New:
src/solokit/operations/health_monitor.py(will be created) - Health monitoringsrc/solokit/operations/incident_manager.py(will be created) - Incident managementsrc/solokit/operations/metrics_collector.py(will be created) - Metrics collectionsrc/solokit/operations/capacity_planner.py(will be created) - Capacity planningsrc/solokit/operations/alert_manager.py(will be created) - Alert managementmonitoring/dashboards/(will be created) - Dashboard configurationsmonitoring/alerts/(will be created) - Alert configurationsdocs/runbooks/(will be created) - Incident runbooks- Tests for operations modules (will be created)
Modified:
.session/config.json- Monitoring configurationsrc/solokit/quality/gates.py- Verify monitoring setup- CI/CD workflows (will be created) - Deploy monitoring configs
Benefits:
- Proactive issue detection: Find problems before users
- Faster incident response: Automated incident creation
- Performance visibility: Know system health at all times
- Capacity planning: Scale before running out of resources
- Reduced alert fatigue: Intelligent alerting
- Operational confidence: Always know system status
Priority: High - Essential for production operations
Enhancement #35: Project Progress Dashboard
Status: ๐ต IDENTIFIED
Problem:
No high-level view of project progress:
- No progress visibility: Don't know how much is complete
- No milestone tracking: Can't see milestone progress
- No velocity trends: Don't know if on track
Proposed Solution:
Implement project progress dashboard showing overall status:
-
Progress Visualization
- Work items by status (pie chart)
- Completion percentage by milestone
- Burndown charts
-
Velocity Tracking
- Story points completed per week
- Velocity trends
- Projected completion dates
-
Blocker Identification
- Blocked work items highlighted
- Risk indicators
Implementation:
Dashboard command:
/sk:status --project
Dashboard generator:
# src/solokit/visualization/dashboard.py (will be created)
class ProgressDashboard:
def generate_dashboard(self):
# Aggregate work item data from repository
# Generate charts and metrics
# Format as markdown
Files Affected:
New:
src/solokit/visualization/dashboard.py(will be created) - Dashboard generation- Tests for dashboard (will be created)
Modified:
.claude/commands/status.md- Add project dashboardsrc/solokit/work_items/repository.py- Query work items for dashboard data
Benefits:
- Progress visibility: Know project status at glance
- Milestone tracking: See progress toward milestones
- Trend analysis: Know if on track
- Risk awareness: Blockers highlighted
Priority: Low - Nice to have, not critical
Enhancement #36: Compliance & Regulatory Framework
Status: ๐ต IDENTIFIED
Problem:
Projects handling sensitive data must comply with various regulations, but there's no automated compliance tracking:
- No compliance validation: GDPR, HIPAA, SOC2, PCI-DSS requirements not checked
- Data privacy gaps: Personal data handling not tracked or validated
- Audit trail missing: No comprehensive logging for compliance audits
- Manual compliance checks: Time-consuming and error-prone manual verification
- Regulation changes: No monitoring for updates to compliance requirements
Example of compliance failure:
Collect user data โ Store in database โ Deploy
โ GDPR audit โ
โ Missing: consent tracking, data export, deletion
โ Fines and legal issues
โ Damage to reputation
Proposed Solution:
Implement compliance and regulatory framework for automated compliance tracking and validation:
-
GDPR Compliance
- Data processing activity tracking
- User consent management and audit trail
- Right to access (data export) automation
- Right to erasure (data deletion) automation
- Data breach notification procedures
- Privacy impact assessments
-
HIPAA Compliance (Healthcare)
- PHI (Protected Health Information) identification and tracking
- Access control and audit logging
- Encryption at rest and in transit validation
- Business Associate Agreement (BAA) tracking
- Breach notification procedures
- Security risk assessments
-
SOC 2 Compliance
- Security controls validation
- Availability monitoring
- Processing integrity checks
- Confidentiality verification
- Privacy controls
- Continuous control monitoring
-
PCI-DSS Compliance (Payment Card Industry)
- Payment data identification and protection
- Network security requirements
- Access control validation
- Regular security testing
- Security policy enforcement
-
Compliance Automation
- Automated compliance checks in CI/CD
- Real-time compliance monitoring
- Compliance dashboard and reporting
- Evidence collection for audits
- Automated remediation suggestions
Implementation:
Compliance checker:
# src/solokit/compliance/compliance_checker.py (will be created)
class ComplianceChecker:
def check_gdpr_compliance(self, codebase):
# Verify GDPR requirements
# - Consent tracking
# - Data export functionality
# - Data deletion functionality
# - Data retention policies
# - Privacy policy exists
def check_hipaa_compliance(self, codebase):
# Verify HIPAA requirements
# - PHI encryption
# - Access controls
# - Audit logging
# - BAA tracking
def check_soc2_compliance(self, system):
# Verify SOC 2 controls
# - Security controls
# - Availability metrics
# - Processing integrity
# - Confidentiality
def check_pci_dss_compliance(self, codebase):
# Verify PCI-DSS requirements
# - Card data encryption
# - Network segmentation
# - Access controls
# - Regular security testing
GDPR automation:
# src/solokit/compliance/gdpr_automation.py (will be created)
class GDPRAutomation:
def track_consent(self, user_id, consent_type):
# Record user consent with timestamp
# Track consent version
# Provide consent audit trail
def export_user_data(self, user_id):
# Collect all user data across systems
# Generate machine-readable export (JSON)
# Include data processing activities log
def delete_user_data(self, user_id):
# Identify all user data locations
# Delete or anonymize data
# Maintain deletion audit trail
# Verify deletion completeness
def generate_privacy_impact_assessment(self, feature):
# Identify personal data collected
# Assess privacy risks
# Propose mitigation measures
Compliance configuration:
# .session/config.json (extended) or compliance_config.yml (will be created)
compliance:
regulations:
- gdpr
- soc2
# - hipaa # Enable for healthcare
# - pci_dss # Enable for payment processing
gdpr:
enabled: true
data_retention_days: 365
consent_tracking: true
require_privacy_policy: true
require_data_export: true
require_data_deletion: true
soc2:
enabled: true
trust_service_criteria:
- security
- availability
- processing_integrity
- confidentiality
- privacy
control_monitoring: true
hipaa:
enabled: false
phi_identification: true
encryption_required: true
audit_logging: true
minimum_necessary_access: true
pci_dss:
enabled: false
cardholder_data_environment: false
tokenization_required: true
security_testing_frequency: "quarterly"
audit:
evidence_collection: true
evidence_storage: ".compliance/evidence/"
audit_log_retention_days: 2555 # 7 years
alerts:
compliance_violations: ["email", "slack"]
regulation_updates: ["email"]
Compliance dashboard:
# /sk:compliance-status
## Compliance Overview
- GDPR: โ
Compliant (98% - 1 minor issue)
- SOC 2: โ ๏ธ Partially Compliant (85% - 3 controls need attention)
- HIPAA: N/A (Not enabled)
- PCI-DSS: N/A (Not enabled)
## GDPR Compliance Details
โ
Consent tracking: Implemented
โ
Data export: Implemented (/api/user/export)
โ
Data deletion: Implemented (/api/user/delete)
โ
Privacy policy: Published and versioned
โ ๏ธ Data retention: Policy defined but not enforced in code
## SOC 2 Compliance Details
โ
Security: Multi-factor auth, encryption, access controls
โ
Availability: 99.9% uptime, monitoring, alerting
โ ๏ธ Processing Integrity: Missing transaction logging for audit
โ ๏ธ Confidentiality: Some sensitive data not encrypted at rest
โ
Privacy: GDPR controls cover privacy requirements
## Action Items
1. Implement automated data retention enforcement (GDPR)
2. Add transaction audit logging (SOC 2 - Processing Integrity)
3. Encrypt sensitive configuration data at rest (SOC 2 - Confidentiality)
## Next Audit: 2025-12-01
## Last Audit: 2025-06-15 (Passed with minor findings)
Commands:
# Check compliance status
/sk:compliance-status [--regulation gdpr|hipaa|soc2|pci-dss]
# Generate compliance report
/sk:compliance-report --regulation gdpr --output pdf
# Run compliance checks
/sk:compliance-check --fix
# Generate privacy impact assessment
/sk:compliance-pia --feature "user-analytics"
# Export evidence for audit
/sk:compliance-evidence-export --period "2025-01-01..2025-12-31"
Files Affected:
New:
src/solokit/compliance/compliance_checker.py(will be created) - Compliance validationsrc/solokit/compliance/gdpr_automation.py(will be created) - GDPR automationsrc/solokit/compliance/hipaa_checker.py(will be created) - HIPAA compliancesrc/solokit/compliance/soc2_monitor.py(will be created) - SOC 2 monitoringsrc/solokit/compliance/pci_dss_validator.py(will be created) - PCI-DSS validationsrc/solokit/compliance/audit_trail.py(will be created) - Audit loggingsrc/solokit/compliance/evidence_collector.py(will be created) - Evidence management.claude/commands/compliance-status.md(will be created) - Compliance status command.claude/commands/compliance-report.md(will be created) - Report generation command.claude/commands/compliance-check.md(will be created) - Compliance validation command.compliance/evidence/(will be created) - Audit evidence storagecompliance_config.yml(will be created) - Compliance configuration- Tests for compliance modules (will be created)
Modified:
src/solokit/project/init.py- Add compliance setup to project initializationsrc/solokit/quality/gates.py- Add compliance gates.session/config.json- Add compliance configuration- CI/CD workflows (will be created) - Add compliance check jobs
Benefits:
- Automated compliance: Continuous compliance monitoring and validation
- Audit readiness: Evidence automatically collected for audits
- Risk mitigation: Catch compliance issues before they become problems
- Regulation tracking: Stay updated on compliance requirement changes
- Cost savings: Reduce manual compliance effort and potential fines
- Customer trust: Demonstrate commitment to data protection
- Legal protection: Documented compliance procedures and audit trails
- Multi-regulation support: Handle multiple compliance requirements simultaneously
Priority: High - Critical for regulated industries (healthcare, finance, e-commerce)
Notes:
- Compliance requirements vary by jurisdiction and industry
- Regular compliance audits recommended (quarterly or annually)
- Legal review recommended for compliance implementation
- Some regulations require third-party audits (e.g., SOC 2)
- Compliance is ongoing, not a one-time effort
Enhancement #37: UAT & Stakeholder Workflow
Status: ๐ต IDENTIFIED
Problem:
No workflow for stakeholder feedback and user acceptance testing:
- No stakeholder involvement: Stakeholders see features only at launch
- No UAT process: No formal user acceptance testing
- No demo environments: Difficult to show work in progress
- No approval workflow: No sign-off before production
Example:
Feature built โ Tests pass โ โ Deploy to production
โ Stakeholder sees feature for first time
โ "This isn't what I wanted" โ
โ Rework required
Proposed Solution:
Implement UAT and stakeholder workflow for feedback and approvals:
-
Stakeholder Feedback Collection
- Create shareable demo links
- Collect structured feedback
- Track feedback status (addressed/rejected/pending)
- Link feedback to work items
-
UAT Test Case Generation
- Auto-generate UAT test cases from acceptance criteria
- Provide test case checklist for stakeholders
- Track UAT execution and results
-
Demo/Preview Environments
- Auto-create preview environment per work item
- Shareable URL for stakeholder review
- Temporary environment (auto-deleted after merge)
- Tools: Vercel preview deployments, Netlify deploy previews, PR environments
-
Approval Workflow Before Production
- Require stakeholder approval before production deploy
- Track approval status
- Block production deployment without approval
- Document approval decisions
Implementation:
Demo environment:
# src/solokit/uat/demo_environment.py (will be created)
class DemoEnvironmentManager:
def create_preview(self, work_item_id, branch):
# Deploy branch to preview environment
# Return preview URL
def share_with_stakeholders(self, preview_url, stakeholders):
# Send preview link to stakeholders
# Include UAT test cases
Feedback collection:
# src/solokit/uat/feedback_collector.py (will be created)
class FeedbackCollector:
def create_feedback_form(self, work_item):
# Generate feedback form
# Include UAT test cases
def collect_feedback(self, form_id):
# Retrieve stakeholder feedback
# Parse and structure feedback
def link_to_work_item(self, feedback, work_item_id):
# Associate feedback with work item
# Create follow-up tasks if needed
UAT test case generator:
# src/solokit/uat/test_case_generator.py (will be created)
class UATTestCaseGenerator:
def generate_from_acceptance_criteria(self, work_item):
# Parse acceptance criteria
# Generate UAT test cases
# Format as checklist
Example UAT test cases:
# UAT Test Cases: User Authentication
## Test Case 1: Successful Login
**Given:** User has valid credentials
**When:** User enters email and password
**Then:**
- [ ] User is redirected to dashboard
- [ ] Welcome message displays user's name
- [ ] Session token is stored
## Test Case 2: Failed Login
**Given:** User enters invalid password
**When:** User submits login form
**Then:**
- [ ] Error message "Invalid credentials" displays
- [ ] User remains on login page
- [ ] No session token stored
## Test Case 3: Forgot Password
**Given:** User clicks "Forgot Password"
**When:** User enters email address
**Then:**
- [ ] Email with reset link sent
- [ ] Confirmation message displays
- [ ] Reset link expires in 1 hour
Approval workflow:
# src/solokit/uat/approval_workflow.py (will be created)
class ApprovalWorkflow:
def request_approval(self, work_item_id, stakeholders):
# Send approval request
# Include demo link and UAT results
def check_approval_status(self, work_item_id):
# Check if approved
# Block deployment if not approved
def record_approval(self, work_item_id, approver, decision):
# Record approval decision
# Document reasoning
Files Affected:
New:
src/solokit/uat/demo_environment.py(will be created) - Demo environment managementsrc/solokit/uat/feedback_collector.py(will be created) - Feedback collectionsrc/solokit/uat/test_case_generator.py(will be created) - UAT test case generationsrc/solokit/uat/approval_workflow.py(will be created) - Approval management.session/tracking/feedback/(will be created) - Feedback storage.session/tracking/approvals/(will be created) - Approval records- Tests for UAT modules (will be created)
Modified:
src/solokit/session/complete.py- Request approval before production deploymentsrc/solokit/quality/gates.py- Block deployment without approval.session/config.json- UAT and approval configuration
Benefits:
- Early feedback: Stakeholders see features before production
- Reduce rework: Catch misalignments before deployment
- Formal UAT: Structured testing process
- Approval tracking: Know what's approved for production
- Demo environments: Easy to share work in progress
- Stakeholder confidence: Involved throughout development
Priority: Medium - Important for stakeholder collaboration
Enhancement #38: Cost & Resource Optimization
Status: ๐ต IDENTIFIED
Problem:
Cloud costs can spiral out of control without monitoring and optimization:
- No cost visibility: Don't know where money is being spent
- Resource waste: Over-provisioned or unused resources
- No budget alerts: Costs exceed budget without warning
- Inefficient architecture: Expensive architectures when cheaper alternatives exist
- No optimization recommendations: Manual cost optimization is time-consuming
Example of cost waste:
Deploy application โ Runs for 6 months
โ Database over-provisioned (90% idle)
โ Load balancer for single instance
โ Storage full of old logs
โ Monthly cost: \$1,200
โ Optimized cost could be: \$300
โ Wasted: \$900/month = \$10,800/year
Proposed Solution:
Implement cost and resource optimization framework for monitoring and reducing cloud costs:
-
Cost Monitoring & Visibility
- Real-time cost tracking per service
- Cost allocation by project/environment/feature
- Cost trend analysis and forecasting
- Budget tracking and alerts
- Multi-cloud cost aggregation (AWS, GCP, Azure)
-
Resource Utilization Analysis
- Identify under-utilized resources
- Track resource usage patterns
- Detect idle or unused resources
- Analyze peak vs average utilization
- Right-sizing recommendations
-
Automated Cost Optimization
- Auto-scaling based on actual usage
- Spot instance recommendations
- Reserved instance analysis
- Storage tier optimization (hot/warm/cold)
- Automated cleanup of unused resources
-
Cost Optimization Recommendations
- Alternative architecture suggestions
- Service tier optimization
- Region cost comparisons
- Commitment discount opportunities
- Open-source alternative suggestions
-
Budget Management
- Set budget limits per environment
- Automated alerts on threshold breach
- Spending forecasts
- Cost anomaly detection
- Automated resource shutdown on budget exceeded
Implementation:
Cost monitor:
# src/solokit/cost/cost_monitor.py (will be created)
class CostMonitor:
def track_current_costs(self):
# Query cloud provider billing APIs
# Aggregate costs by service, region, project
# Calculate daily/weekly/monthly costs
def analyze_cost_trends(self):
# Historical cost analysis
# Identify cost spikes
# Forecast future costs
def alert_on_budget_breach(self, threshold):
# Check if costs exceed budget
# Send alerts to configured channels
# Trigger automated actions if needed
Resource optimizer:
# src/solokit/cost/resource_optimizer.py (will be created)
class ResourceOptimizer:
def identify_underutilized_resources(self):
# Analyze CPU, memory, disk usage
# Identify resources with <30% utilization
# Calculate potential savings
def recommend_rightsizing(self, resource):
# Analyze historical usage patterns
# Recommend appropriate instance types
# Calculate cost savings
def find_idle_resources(self):
# Identify stopped instances still incurring costs
# Find unused load balancers, IPs, volumes
# Estimate monthly waste
def optimize_storage_tiers(self):
# Analyze storage access patterns
# Recommend tier migrations (hot โ cold)
# Calculate storage cost savings
Cost optimization engine:
# src/solokit/cost/optimization_engine.py (will be created)
class CostOptimizationEngine:
def recommend_spot_instances(self):
# Identify workloads suitable for spot instances
# Calculate potential savings (60-90% off)
# Provide migration guide
def analyze_reserved_instances(self):
# Compare on-demand vs reserved pricing
# Recommend reservation commitments
# Calculate breakeven point
def suggest_architectural_changes(self):
# Identify expensive patterns
# Suggest cheaper alternatives
# Estimate implementation effort vs savings
def recommend_service_alternatives(self):
# Identify overpriced managed services
# Suggest open-source alternatives
# Calculate TCO comparison
Cost configuration:
# .session/config.json (extended) or cost_config.yml (will be created)
cost_optimization:
monitoring:
enabled: true
cloud_providers:
- aws
- gcp
# - azure
update_frequency: "hourly"
budgets:
development:
monthly_limit: 500
alert_thresholds: [50, 75, 90, 100]
staging:
monthly_limit: 200
alert_thresholds: [75, 90, 100]
production:
monthly_limit: 2000
alert_thresholds: [75, 90, 100]
auto_shutdown: false # Don't auto-shutdown production
optimization:
auto_rightsizing: false # Recommend only, don't auto-apply
auto_cleanup_idle: true # Clean up stopped resources after 7 days
storage_tier_optimization: true
reserved_instance_analysis: true
alerts:
cost_alerts: ["email", "slack"]
optimization_opportunities: ["email"]
budget_breach: ["email", "pagerduty"]
reporting:
weekly_cost_report: true
monthly_optimization_report: true
savings_tracking: true
Cost dashboard:
# /sk:cost-status
## Monthly Cost Summary
- **Current Month**: \$1,247 / \$2,000 budget (62%)
- **Last Month**: \$1,189
- **Forecast**: \$1,650 (18% under budget)
- **YoY Growth**: +12%
## Cost Breakdown by Service
- Compute (EC2/VMs): \$687 (55%)
- Database (RDS/Cloud SQL): \$312 (25%)
- Storage (S3/GCS): \$127 (10%)
- Networking: \$89 (7%)
- Other: \$32 (3%)
## Cost by Environment
- Production: \$987 (79%)
- Staging: \$172 (14%)
- Development: \$88 (7%)
## Optimization Opportunities
1. **Right-size database** - Current: db.m5.2xlarge (\$562/mo), Recommended: db.m5.xlarge (\$281/mo)
- Savings: \$281/month (\$3,372/year)
- Utilization: 28% average CPU
2. **Move logs to cold storage** - 500GB in hot storage (\$115/mo), 450GB not accessed in 90 days
- Savings: \$90/month (\$1,080/year)
- Move 450GB to Glacier
3. **Use spot instances for batch jobs** - 5 instances running 24/7 (\$365/mo)
- Savings: \$255/month (\$3,060/year)
- 70% cost reduction with spot
4. **Remove unused load balancer** - 1 ALB with no traffic (\$23/mo)
- Savings: \$23/month (\$276/year)
## Total Potential Savings: \$649/month (\$7,788/year)
## Current Optimization Score: 72/100
Commands:
# View cost status
/sk:cost-status [--environment prod|staging|dev]
# Analyze optimization opportunities
/sk:cost-optimize --analyze
# Generate cost report
/sk:cost-report --period "2025-01-01..2025-12-31" --output pdf
# Set budget alert
/sk:cost-budget-set --environment prod --limit 2000 --currency USD
# Forecast costs
/sk:cost-forecast --months 6
Files Affected:
New:
src/solokit/cost/cost_monitor.py(will be created) - Cost tracking and monitoringsrc/solokit/cost/resource_optimizer.py(will be created) - Resource utilization analysissrc/solokit/cost/optimization_engine.py(will be created) - Cost optimization recommendationssrc/solokit/cost/budget_manager.py(will be created) - Budget tracking and alertssrc/solokit/cost/cloud_provider_integrations/(will be created) - AWS, GCP, Azure integrations.claude/commands/cost-status.md(will be created) - Cost status command.claude/commands/cost-optimize.md(will be created) - Optimization command.claude/commands/cost-report.md(will be created) - Cost reporting command.claude/commands/cost-budget-set.md(will be created) - Budget management commandcost_config.yml(will be created) - Cost optimization configuration- Tests for cost monitoring modules (will be created)
Modified:
src/solokit/project/init.py- Add cost monitoring setup.session/config.json- Add cost optimization configuration- CI/CD workflows (will be created) - Add cost check jobs
Benefits:
- Cost visibility: Always know where money is spent
- Budget control: Prevent cost overruns with alerts and limits
- Resource efficiency: Eliminate waste from idle or over-provisioned resources
- Predictable costs: Accurate forecasting for budget planning
- Automated savings: Continuous optimization without manual effort
- Multi-cloud support: Track costs across multiple cloud providers
- ROI tracking: Measure savings from optimization efforts
- Financial accountability: Cost allocation per project/team
Priority: Medium-High - Important for budget-conscious solo developers and startups
Notes:
- Requires cloud provider API credentials with billing access
- Cost data typically has 24-hour delay
- Aggressive optimization can impact performance (monitor carefully)
- Reserved instances require commitment (1-3 years)
- Consider business criticality before automated resource shutdown
Enhancement #39: Automated Code Review
Status: ๐ต IDENTIFIED
Problem:
Code reviews are manual and time-consuming. Common issues missed:
- No automated review: Every line requires human review
- Inconsistent feedback: Review quality varies
- Common patterns missed: Same issues repeat
- Security vulnerabilities: May be overlooked in review
Proposed Solution:
Implement AI-powered automated code review that provides suggestions:
-
Code Analysis
- Analyze code changes for common issues
- Detect anti-patterns and code smells
- Identify performance issues
-
Best Practice Recommendations
- Suggest better patterns and approaches
- Recommend idiomatic code
- Link to documentation and examples
-
Security Vulnerability Detection
- Identify security issues in code
- Suggest secure alternatives
- Link to security best practices
-
Improvement Suggestions
- Suggest refactoring opportunities
- Identify complexity issues
- Recommend simplifications
Implementation:
Code reviewer:
# src/solokit/review/code_reviewer.py (will be created)
class AutomatedCodeReviewer:
def review_changes(self, file_changes):
# Analyze code changes
# Generate review comments
def detect_issues(self, code):
# Find anti-patterns, code smells
def suggest_improvements(self, code):
# Recommend better approaches
Files Affected:
New:
src/solokit/review/code_reviewer.py(will be created) - Automated reviewsrc/solokit/review/pattern_detector.py(will be created) - Anti-pattern detectionsrc/solokit/review/security_analyzer.py(will be created) - Security vulnerability detection- Tests for code review modules (will be created)
Modified:
src/solokit/session/complete.py- Run automated review before completionsrc/solokit/quality/gates.py- Add code review quality gate.session/config.json- Add code review configuration
Benefits:
- Faster reviews: Automated feedback
- Consistent quality: Every change reviewed
- Learning opportunity: Suggestions improve skills
- Catch issues early: Problems found before merge
Priority: Low - Nice to have, not critical
Enhancement #40: React Performance Best Practices Integration
Status: ๐ต IDENTIFIED
Problem:
AI assistants writing React code often produce functional but suboptimal code with common performance anti-patterns:
- Async waterfalls: Sequential data fetching when parallel is possible (e.g., awaiting data before early returns)
- Bundle size bloat: Heavy client-side imports, missing code splitting, unnecessary dependencies
- Re-render storms: Missing memoization, improper useEffect dependencies, cascading state updates
- Server/client misalignment: Work done on client that should be server-side, or vice versa
- Framework anti-patterns: Not leveraging Next.js App Router optimizations, RSC patterns
These issues are particularly problematic for solo developers who may not have the expertise to catch them in code review.
Context:
Vercel released react-best-practices (January 2026) - a structured knowledge base of 40+ rules across 8 priority categories, specifically designed for AI agent consumption:
| Category | Prefix | Focus | Impact Level |
|---|---|---|---|
| Eliminating Waterfalls | async-* | Sequential โ parallel data fetching | CRITICAL |
| Bundle Size | bundle-* | Code splitting, tree shaking, lazy loading | CRITICAL |
| Server-Side Performance | server-* | RSC, streaming, edge runtime | HIGH |
| Client-Side Data Fetching | client-* | SWR, React Query, caching | HIGH |
| Re-render Optimization | rerender-* | Memoization, state management | MEDIUM-HIGH |
| Rendering Performance | rendering-* | Virtual DOM, reconciliation | MEDIUM |
| Advanced Patterns | advanced-* | Suspense, transitions, concurrent | LOW-MEDIUM |
| JavaScript Performance | js-* | Micro-optimizations | LOW |
Each rule includes:
- Impact level classification
- Problematic code example (what NOT to do)
- Correct code example (what TO do)
- Explanation of why it matters
- References to documentation
Source: https://vercel.com/blog/introducing-react-best-practices Repository: https://github.com/vercel-labs/agent-skills/tree/main/skills/react-best-practices
Applicable Solokit Stacks:
Three of four Solokit stacks use React:
- saas_t3 - Next.js 16 + React 19 + tRPC + Prisma
- dashboard_refine - Next.js 16 + Refine 5 + shadcn/ui
- fullstack_nextjs - Next.js 16 + React 19 + Prisma
Only ml_ai_fastapi (Python/FastAPI) is out of scope.
Proposed Solution:
Integrate React performance best practices into Solokit's quality and guidance system for React-based stacks:
1. React Performance Guide Generation
Generate a REACT_PERFORMANCE_GUIDE.md in .session/guides/ during sk init for React stacks:
# src/solokit/init/react_performance_guide.py (will be created)
class ReactPerformanceGuideGenerator:
def __init__(self, quality_tier: int, stack: str):
self.quality_tier = quality_tier
self.stack = stack
def generate(self) -> str:
"""Generate tier-appropriate React performance guide."""
rules = self._get_rules_for_tier()
return self._render_guide(rules)
def _get_rules_for_tier(self) -> list[dict]:
"""Return rules appropriate for quality tier."""
# Tier 1-2: CRITICAL + HIGH only
# Tier 3: Add MEDIUM-HIGH, MEDIUM
# Tier 4: Full coverage including advanced patterns
2. Rule Curation by Quality Tier
Map Vercel's impact levels to Solokit's quality tiers:
| Quality Tier | Included Rules | Rationale |
|---|---|---|
| Tier 1: Essential | CRITICAL only (async waterfalls, bundle size) | Focus on highest-impact issues |
| Tier 2: Standard | CRITICAL + HIGH (+ server, client) | Add server/client optimization |
| Tier 3: Comprehensive | All except LOW | Full performance coverage |
| Tier 4: Production-Ready | All 40+ rules | Complete best practices |
3. Session Briefing Integration
Include relevant React performance reminders in session briefings when working on React components:
# src/solokit/session/briefing.py (modified)
def _get_react_performance_context(self, work_item: WorkItem) -> str:
"""Include React performance guidance for component work."""
if self._is_react_component_work(work_item):
return self._load_relevant_rules(work_item)
return ""
4. Anti-Pattern Detection Quality Gate (Optional)
Add a quality gate checker that scans for known anti-patterns:
# src/solokit/quality/checkers/react_performance_checker.py (will be created)
class ReactPerformanceChecker(BaseChecker):
"""Detect common React performance anti-patterns."""
PATTERNS = {
"cascading_useeffect": r"useEffect\([^)]+\)[\s\S]*?useEffect\(",
"heavy_client_import": r"'use client'[\s\S]*?import.*from\s+['\"](?:lodash|moment)['\"]",
"missing_suspense": r"async function.*Component",
# ... more patterns
}
def check(self, files: list[str]) -> CheckResult:
"""Scan React files for anti-patterns."""
5. Claude Code Integration
Add React-specific guidance to .claude/ for React stacks:
# .claude/REACT_PERFORMANCE.md (will be created for React stacks)
## React Performance Guidelines
When writing React code in this project, follow these priority-ordered practices:
### CRITICAL: Eliminate Async Waterfalls
- Parallelize independent data fetches with Promise.all()
- Move awaits inside conditionals when early returns exist
- Use React Server Components for data fetching when possible
### CRITICAL: Minimize Bundle Size
- Use dynamic imports for heavy components: `const Chart = dynamic(() => import('./Chart'))`
- Prefer server components (no 'use client' unless needed)
- Tree-shake imports: `import { specific } from 'lib'` not `import * from 'lib'`
[... more rules based on quality tier ...]
6. Stack-Specific Adaptations
Different React stacks have different patterns:
| Stack | Special Considerations |
|---|---|
| saas_t3 | tRPC batching, Prisma query optimization |
| dashboard_refine | Refine data provider caching, table virtualization |
| fullstack_nextjs | Server Actions, streaming, Partial Prerendering |
Implementation:
Phase 1: Guide Generation
# src/solokit/init/react_performance_guide.py
class ReactPerformanceGuideGenerator:
RULES_BY_IMPACT = {
"CRITICAL": [
{
"id": "async-parallel-fetching",
"title": "Parallelize Independent Data Fetches",
"problem": "Sequential awaits for independent data",
"solution": "Use Promise.all() for parallel fetching",
"example_bad": "const a = await fetchA(); const b = await fetchB();",
"example_good": "const [a, b] = await Promise.all([fetchA(), fetchB()]);",
},
{
"id": "async-conditional-await",
"title": "Move Awaits Inside Conditionals",
"problem": "Awaiting data before early return checks",
"solution": "Check conditions before awaiting when possible",
},
{
"id": "bundle-dynamic-imports",
"title": "Use Dynamic Imports for Heavy Components",
"problem": "Large components in initial bundle",
"solution": "Use next/dynamic or React.lazy for code splitting",
},
# ... more CRITICAL rules
],
"HIGH": [
{
"id": "server-rsc-data-fetching",
"title": "Fetch Data in Server Components",
"problem": "Client-side data fetching with useEffect",
"solution": "Use async Server Components for initial data",
},
# ... more HIGH rules
],
# ... MEDIUM-HIGH, MEDIUM, LOW-MEDIUM, LOW
}
def generate_for_tier(self, tier: int) -> str:
"""Generate guide content for specified quality tier."""
included_impacts = self._get_impacts_for_tier(tier)
rules = []
for impact in included_impacts:
rules.extend(self.RULES_BY_IMPACT.get(impact, []))
return self._render_markdown(rules)
def _get_impacts_for_tier(self, tier: int) -> list[str]:
if tier == 1:
return ["CRITICAL"]
elif tier == 2:
return ["CRITICAL", "HIGH"]
elif tier == 3:
return ["CRITICAL", "HIGH", "MEDIUM-HIGH", "MEDIUM"]
else: # tier 4
return ["CRITICAL", "HIGH", "MEDIUM-HIGH", "MEDIUM", "LOW-MEDIUM", "LOW"]
Phase 2: Init Integration
# src/solokit/init/orchestrator.py (modified)
def _setup_guides(self, stack: str, tier: int):
"""Generate development guides for the project."""
# Existing guide generation...
# Add React performance guide for React stacks
if stack in ["saas_t3", "dashboard_refine", "fullstack_nextjs"]:
react_guide = ReactPerformanceGuideGenerator(tier, stack)
guide_content = react_guide.generate()
self._write_guide(".session/guides/REACT_PERFORMANCE_GUIDE.md", guide_content)
Phase 3: Briefing Integration
# src/solokit/session/briefing.py (modified)
def _build_briefing(self, work_item: WorkItem) -> str:
briefing_parts = [
self._get_header(work_item),
self._get_spec_summary(work_item),
self._get_relevant_learnings(work_item),
self._get_relevant_guides(work_item), # NEW: includes React perf guide
self._get_git_context(),
]
return "\n\n".join(briefing_parts)
def _get_relevant_guides(self, work_item: WorkItem) -> str:
"""Include relevant guide excerpts based on work item context."""
guides = []
# Include React performance tips for component work
if self._involves_react_components(work_item):
react_guide = self._load_guide("REACT_PERFORMANCE_GUIDE.md")
if react_guide:
# Include top 5 most relevant rules based on work item
guides.append(self._extract_relevant_rules(react_guide, work_item))
return "\n".join(guides)
Phase 4: Quality Gate (Optional)
# src/solokit/quality/checkers/react_performance_checker.py
from solokit.quality.checkers.base import BaseChecker, CheckResult
class ReactPerformanceChecker(BaseChecker):
"""Check for common React performance anti-patterns."""
name = "react-performance"
description = "React performance anti-pattern detection"
# Regex patterns for common anti-patterns
ANTI_PATTERNS = {
"sequential-await": {
"pattern": r"const\s+\w+\s*=\s*await\s+\w+\([^)]*\);\s*\n\s*const\s+\w+\s*=\s*await",
"message": "Consider using Promise.all() for parallel data fetching",
"severity": "warning",
"impact": "CRITICAL",
},
"use-client-with-heavy-import": {
"pattern": r"['\"]use client['\"][\s\S]{0,500}import.*from\s+['\"](?:lodash|moment|date-fns)['\"]",
"message": "Heavy library imported in client component - consider server component or dynamic import",
"severity": "warning",
"impact": "CRITICAL",
},
"cascading-useeffect": {
"pattern": r"useEffect\(\s*\(\)\s*=>\s*\{[^}]+set\w+\([^)]+\)[^}]+\}\s*,\s*\[[^\]]+\]\s*\)[\s\S]{0,200}useEffect",
"message": "Cascading useEffect calls detected - consider combining or using derived state",
"severity": "info",
"impact": "MEDIUM-HIGH",
},
}
def check(self, context: CheckContext) -> CheckResult:
"""Scan React/TSX files for anti-patterns."""
issues = []
react_files = self._find_react_files(context.project_root)
for file_path in react_files:
content = self._read_file(file_path)
for pattern_name, pattern_info in self.ANTI_PATTERNS.items():
if re.search(pattern_info["pattern"], content):
issues.append({
"file": file_path,
"pattern": pattern_name,
"message": pattern_info["message"],
"severity": pattern_info["severity"],
})
return CheckResult(
passed=len([i for i in issues if i["severity"] == "error"]) == 0,
issues=issues,
summary=f"Found {len(issues)} potential React performance issues",
)
Files Affected:
New:
src/solokit/init/react_performance_guide.py- Guide generator with curated rulessrc/solokit/quality/checkers/react_performance_checker.py- Anti-pattern detectorsrc/solokit/templates/saas_t3/base/.claude/REACT_PERFORMANCE.md- Claude guidancesrc/solokit/templates/dashboard_refine/base/.claude/REACT_PERFORMANCE.md- Claude guidancesrc/solokit/templates/fullstack_nextjs/base/.claude/REACT_PERFORMANCE.md- Claude guidancesrc/solokit/data/react_performance_rules.py- Curated rule definitions- Tests for React performance modules
Modified:
src/solokit/init/orchestrator.py- Add React guide generation for React stackssrc/solokit/session/briefing.py- Include React performance context in briefingssrc/solokit/quality/gates.py- Register React performance checker.session/config.jsonschema - Add react_performance checker config- Template
package.jsonfiles - No changes (no runtime dependencies)
Testing Requirements:
-
Unit Tests:
- Guide generator produces correct rules for each tier
- Anti-pattern regex patterns detect known bad patterns
- Stack detection correctly identifies React stacks
-
Integration Tests:
sk initgenerates REACT_PERFORMANCE_GUIDE.md for React stackssk initdoes NOT generate guide for ml_ai_fastapi- Session briefings include React guidance when appropriate
- Quality gate reports anti-patterns correctly
-
E2E Tests:
- Full init โ start โ develop โ end cycle with React performance guidance
- Verify guide content matches quality tier selection
Benefits:
- Proactive quality: AI writes better React code from the start
- Prioritized guidance: Focus on highest-impact issues first (waterfalls, bundle size)
- Tier-appropriate: Don't overwhelm beginners with advanced patterns
- Self-contained: No external dependencies, rules bundled with Solokit
- Stack-aware: Guidance tailored to specific React stack (T3, Refine, Next.js)
- Continuous improvement: Rules can be updated with Solokit releases
- Learning capture: Developers learn best practices through usage
Priority: Medium-High - Significant quality improvement for React stacks (75% of templates)
Notes:
- Rules should be curated from Vercel's repository, not copied verbatim (licensing)
- Consider periodic updates as React/Next.js evolves
- Anti-pattern checker should be non-blocking by default (warnings, not errors)
- May want to add
/sk:react-auditcommand in future for on-demand review - Integration with Enhancement #29 (Frontend Quality & Design System Compliance)
References:
- Vercel Blog: https://vercel.com/blog/introducing-react-best-practices
- Agent Skills Repo: https://github.com/vercel-labs/agent-skills
- React Performance Docs: https://react.dev/learn/render-and-commit