State Persistence

July 1, 2025 · View on GitHub

Overview

State Persistence is a critical system that ensures all tasks, sessions, and application state survive server restarts, crashes, and updates. It provides reliable filesystem-based storage with automatic backups and recovery mechanisms.

Architecture

                State Persistence

    ┌─────────────────┼─────────────────┐
    │                 │                 │
Task Files       State File        Backups
    │                 │                 │
 Individual      Application      Historical
JSON Files       Snapshot         Snapshots

Storage Structure

Directory Layout

coding-agent-state/           # Default: ./coding-agent-state
├── state.json               # Main application state
├── state.backup.*.json      # Automatic backups (last 10)
├── tasks/                   # Individual task files
│   ├── task_123.json
│   ├── task_456.json
│   └── ...
├── sessions/                # Session data (future use)
├── logs/                    # Session logs
│   └── session_*.log
└── reports/                 # Generated reports
    └── report_*.json

Configurable Location

# Set custom state directory
STATE_PATH=/var/lib/coding-agent npm start

# Or in .env
STATE_PATH=/home/user/.coding-agent-state

Core Features

1. Automatic Persistence

  • Tasks saved immediately on creation/update
  • State snapshot every 30 seconds
  • No data loss on unexpected shutdown

2. Backup Management

  • Automatic backups before each save
  • Keeps last 10 backup files
  • Timestamp-based naming

3. Atomic Operations

  • Safe file writes with error handling
  • Backup creation before overwrites
  • Recovery from partial writes

4. Event System

  • state:saved - State successfully persisted
  • state:loaded - State loaded from disk
  • state:save-error - Persistence failure
  • autosave:triggered - Auto-save initiated

Data Structures

Persisted State

interface PersistedState {
  tasks: Task[];                  // All tasks
  sessions: SessionInfo[];        // Active sessions
  metrics: {
    total_tasks: number;
    completed_tasks: number;
    failed_tasks: number;
    average_completion_time: number;
  };
  last_saved: string;            // ISO timestamp
}

Task Storage

Each task is stored as an individual JSON file:

{
  "id": "task_1234567890",
  "description": "Implement authentication",
  "status": "completed",
  "tool": "CLAUDECODE",
  "created_at": "2024-01-01T10:00:00Z",
  "updated_at": "2024-01-01T11:30:00Z",
  "started_at": "2024-01-01T10:05:00Z",
  "completed_at": "2024-01-01T11:30:00Z",
  "logs": [
    {
      "timestamp": "2024-01-01T10:05:00Z",
      "level": "info",
      "type": "system",
      "message": "Task started"
    }
  ],
  "result": {
    "files_created": ["auth.js", "auth.test.js"],
    "tests_passed": true
  }
}

API Methods

Saving State

// Save complete application state
await persistence.saveState({
  tasks: allTasks,
  sessions: activeSessions,
  metrics: currentMetrics,
  last_saved: new Date().toISOString()
});

// Save individual task
await persistence.saveTask(task);

// Save session log
await persistence.saveSessionLog(sessionId, logContent);

// Save report
const reportPath = await persistence.saveReport(reportId, reportData);

Loading State

// Load application state
const state = await persistence.loadState();
if (state) {
  // Restore tasks and sessions
  restoreFromState(state);
}

// Load all tasks
const tasks = await persistence.loadTasks();

Cleanup Operations

// Delete a task
await persistence.deleteTask(taskId);

// Shutdown gracefully
await persistence.shutdown();

Recovery Mechanisms

1. Startup Recovery

On startup, the system:

  1. Checks for state.json
  2. Falls back to individual task files
  3. Validates data integrity
  4. Rebuilds in-memory state

2. Corruption Handling

If main state is corrupted:

  1. Try loading latest backup
  2. Reconstruct from task files
  3. Log corruption details
  4. Continue with partial state

3. Missing Files

Handles missing files gracefully:

  • Creates directories if needed
  • Initializes empty state
  • Logs missing file warnings
  • Continues operation

Auto-Save System

Configuration

// Default: 30-second intervals
private saveInterval = setInterval(() => {
  this.autoSave();
}, 30000);

Save Triggers

  1. Immediate Save

    • Task creation
    • Task status change
    • Task completion
  2. Interval Save

    • Every 30 seconds
    • Full state snapshot
    • Metrics update
  3. Shutdown Save

    • On graceful shutdown
    • Before process exit
    • Emergency state dump

File Safety

ID Validation

// Sanitize task IDs for filesystem
const safeId = validateTaskId(taskId);
// Removes: /, \, :, *, ?, ", <, >, |

Write Operations

  1. Prepare - Create backup if exists
  2. Write - Atomic write to new file
  3. Replace - Move new over old
  4. Cleanup - Remove old backups

Error Handling

try {
  await fs.writeFile(path, data);
} catch (error) {
  // Log error
  // Emit error event
  // Try backup location
  // Maintain service stability
}

Best Practices

1. State Management

  • Keep state files reasonable size
  • Archive old completed tasks
  • Monitor disk usage
  • Regular backup exports

2. Performance

  • Use individual task files
  • Batch state updates
  • Async I/O operations
  • Minimize file locks

3. Reliability

  • Handle all error cases
  • Validate before saving
  • Test recovery procedures
  • Monitor save failures

4. Security

  • Restrict directory permissions
  • Validate all inputs
  • No sensitive data in logs
  • Encrypt if needed

Monitoring

Health Checks

// Check persistence health
const health = {
  stateDirExists: fs.existsSync(statePath),
  writeable: isWriteable(statePath),
  diskSpace: getDiskSpace(statePath),
  lastSaveTime: getLastSaveTime(),
  backupCount: getBackupCount()
};

Metrics

Track persistence performance:

  • Save duration
  • Load duration
  • Failure rate
  • Disk usage
  • File counts

Alerts

Set up monitoring for:

  • Save failures
  • Disk space low
  • Corruption detected
  • Backup failures
  • Permission errors

Migration & Upgrades

Version Compatibility

State files include version info:

{
  "version": "1.0.0",
  "tasks": [...],
  "sessions": [...]
}

Migration Strategy

  1. Detect Version

    • Check state file version
    • Compare with current
  2. Run Migration

    • Backup current state
    • Apply transformations
    • Validate result
  3. Update Version

    • Save migrated state
    • Update version field
    • Log migration

Rollback Support

Keep pre-migration backups:

state.pre-migration.backup.json
state.migration-v1-to-v2.log

Troubleshooting

Common Issues

  1. Permission Denied

    # Fix permissions
    chmod -R 755 ./coding-agent-state
    
  2. Disk Full

    # Check disk usage
    du -sh ./coding-agent-state
    
    # Clean old files
    npm run clean-state
    
  3. Corrupted State

    # Restore from backup
    cp state.backup.*.json state.json
    

Debug Mode

Enable detailed logging:

DEBUG=persistence:* npm start

Manual Recovery

# List all tasks
ls -la coding-agent-state/tasks/

# Manually edit task
nano coding-agent-state/tasks/task_123.json

# Force state rebuild
rm coding-agent-state/state.json
npm start  # Will rebuild from tasks

Future Enhancements

  1. Database Backend

    • PostgreSQL support
    • Better querying
    • Transactions
    • Replication
  2. Cloud Storage

    • S3 integration
    • Google Cloud Storage
    • Azure Blob Storage
    • Automatic sync
  3. Encryption

    • At-rest encryption
    • Encrypted backups
    • Key management
    • Compliance support
  4. Real-time Sync

    • Multi-instance support
    • Conflict resolution
    • Event streaming
    • Distributed state