State Persistence
July 1, 2025 · View on GitHub
Overview
State Persistence is a critical system that ensures all tasks, sessions, and application state survive server restarts, crashes, and updates. It provides reliable filesystem-based storage with automatic backups and recovery mechanisms.
Architecture
State Persistence
│
┌─────────────────┼─────────────────┐
│ │ │
Task Files State File Backups
│ │ │
Individual Application Historical
JSON Files Snapshot Snapshots
Storage Structure
Directory Layout
coding-agent-state/ # Default: ./coding-agent-state
├── state.json # Main application state
├── state.backup.*.json # Automatic backups (last 10)
├── tasks/ # Individual task files
│ ├── task_123.json
│ ├── task_456.json
│ └── ...
├── sessions/ # Session data (future use)
├── logs/ # Session logs
│ └── session_*.log
└── reports/ # Generated reports
└── report_*.json
Configurable Location
# Set custom state directory
STATE_PATH=/var/lib/coding-agent npm start
# Or in .env
STATE_PATH=/home/user/.coding-agent-state
Core Features
1. Automatic Persistence
- Tasks saved immediately on creation/update
- State snapshot every 30 seconds
- No data loss on unexpected shutdown
2. Backup Management
- Automatic backups before each save
- Keeps last 10 backup files
- Timestamp-based naming
3. Atomic Operations
- Safe file writes with error handling
- Backup creation before overwrites
- Recovery from partial writes
4. Event System
state:saved- State successfully persistedstate:loaded- State loaded from diskstate:save-error- Persistence failureautosave:triggered- Auto-save initiated
Data Structures
Persisted State
interface PersistedState {
tasks: Task[]; // All tasks
sessions: SessionInfo[]; // Active sessions
metrics: {
total_tasks: number;
completed_tasks: number;
failed_tasks: number;
average_completion_time: number;
};
last_saved: string; // ISO timestamp
}
Task Storage
Each task is stored as an individual JSON file:
{
"id": "task_1234567890",
"description": "Implement authentication",
"status": "completed",
"tool": "CLAUDECODE",
"created_at": "2024-01-01T10:00:00Z",
"updated_at": "2024-01-01T11:30:00Z",
"started_at": "2024-01-01T10:05:00Z",
"completed_at": "2024-01-01T11:30:00Z",
"logs": [
{
"timestamp": "2024-01-01T10:05:00Z",
"level": "info",
"type": "system",
"message": "Task started"
}
],
"result": {
"files_created": ["auth.js", "auth.test.js"],
"tests_passed": true
}
}
API Methods
Saving State
// Save complete application state
await persistence.saveState({
tasks: allTasks,
sessions: activeSessions,
metrics: currentMetrics,
last_saved: new Date().toISOString()
});
// Save individual task
await persistence.saveTask(task);
// Save session log
await persistence.saveSessionLog(sessionId, logContent);
// Save report
const reportPath = await persistence.saveReport(reportId, reportData);
Loading State
// Load application state
const state = await persistence.loadState();
if (state) {
// Restore tasks and sessions
restoreFromState(state);
}
// Load all tasks
const tasks = await persistence.loadTasks();
Cleanup Operations
// Delete a task
await persistence.deleteTask(taskId);
// Shutdown gracefully
await persistence.shutdown();
Recovery Mechanisms
1. Startup Recovery
On startup, the system:
- Checks for
state.json - Falls back to individual task files
- Validates data integrity
- Rebuilds in-memory state
2. Corruption Handling
If main state is corrupted:
- Try loading latest backup
- Reconstruct from task files
- Log corruption details
- Continue with partial state
3. Missing Files
Handles missing files gracefully:
- Creates directories if needed
- Initializes empty state
- Logs missing file warnings
- Continues operation
Auto-Save System
Configuration
// Default: 30-second intervals
private saveInterval = setInterval(() => {
this.autoSave();
}, 30000);
Save Triggers
-
Immediate Save
- Task creation
- Task status change
- Task completion
-
Interval Save
- Every 30 seconds
- Full state snapshot
- Metrics update
-
Shutdown Save
- On graceful shutdown
- Before process exit
- Emergency state dump
File Safety
ID Validation
// Sanitize task IDs for filesystem
const safeId = validateTaskId(taskId);
// Removes: /, \, :, *, ?, ", <, >, |
Write Operations
- Prepare - Create backup if exists
- Write - Atomic write to new file
- Replace - Move new over old
- Cleanup - Remove old backups
Error Handling
try {
await fs.writeFile(path, data);
} catch (error) {
// Log error
// Emit error event
// Try backup location
// Maintain service stability
}
Best Practices
1. State Management
- Keep state files reasonable size
- Archive old completed tasks
- Monitor disk usage
- Regular backup exports
2. Performance
- Use individual task files
- Batch state updates
- Async I/O operations
- Minimize file locks
3. Reliability
- Handle all error cases
- Validate before saving
- Test recovery procedures
- Monitor save failures
4. Security
- Restrict directory permissions
- Validate all inputs
- No sensitive data in logs
- Encrypt if needed
Monitoring
Health Checks
// Check persistence health
const health = {
stateDirExists: fs.existsSync(statePath),
writeable: isWriteable(statePath),
diskSpace: getDiskSpace(statePath),
lastSaveTime: getLastSaveTime(),
backupCount: getBackupCount()
};
Metrics
Track persistence performance:
- Save duration
- Load duration
- Failure rate
- Disk usage
- File counts
Alerts
Set up monitoring for:
- Save failures
- Disk space low
- Corruption detected
- Backup failures
- Permission errors
Migration & Upgrades
Version Compatibility
State files include version info:
{
"version": "1.0.0",
"tasks": [...],
"sessions": [...]
}
Migration Strategy
-
Detect Version
- Check state file version
- Compare with current
-
Run Migration
- Backup current state
- Apply transformations
- Validate result
-
Update Version
- Save migrated state
- Update version field
- Log migration
Rollback Support
Keep pre-migration backups:
state.pre-migration.backup.json
state.migration-v1-to-v2.log
Troubleshooting
Common Issues
-
Permission Denied
# Fix permissions chmod -R 755 ./coding-agent-state -
Disk Full
# Check disk usage du -sh ./coding-agent-state # Clean old files npm run clean-state -
Corrupted State
# Restore from backup cp state.backup.*.json state.json
Debug Mode
Enable detailed logging:
DEBUG=persistence:* npm start
Manual Recovery
# List all tasks
ls -la coding-agent-state/tasks/
# Manually edit task
nano coding-agent-state/tasks/task_123.json
# Force state rebuild
rm coding-agent-state/state.json
npm start # Will rebuild from tasks
Future Enhancements
-
Database Backend
- PostgreSQL support
- Better querying
- Transactions
- Replication
-
Cloud Storage
- S3 integration
- Google Cloud Storage
- Azure Blob Storage
- Automatic sync
-
Encryption
- At-rest encryption
- Encrypted backups
- Key management
- Compliance support
-
Real-time Sync
- Multi-instance support
- Conflict resolution
- Event streaming
- Distributed state