Testing Framework

July 1, 2025 · View on GitHub

Overview

The SystemPrompt Coding Agent includes a comprehensive End-to-End (E2E) testing framework built specifically for testing MCP (Model Context Protocol) servers. The framework validates the complete flow of AI agent orchestration, from task creation through completion.

Architecture

Test Runner

    ├── MCP Client
    │   ├── HTTP Transport
    │   └── Notification Handlers

    ├── Test Reporter
    │   ├── HTML Reports
    │   └── Markdown Reports

    └── Test Utils
        ├── Environment Detection
        ├── Logging
        └── Assertions

Core Components

1. Test Runner

Main test orchestration and execution.

Features:

  • Sequential test execution
  • Error handling and recovery
  • Timeout management
  • Result aggregation

2. MCP Client Integration

Full MCP protocol client for testing.

Capabilities:

  • Tool invocation
  • Resource reading
  • Notification handling
  • Progress tracking

3. Test Reporter

Comprehensive test reporting system.

Output Formats:

  • HTML Reports - Interactive, styled reports
  • Markdown Reports - Git-friendly text reports
  • Console Output - Real-time test progress

4. Test Utilities

Helper functions and common patterns.

Includes:

  • Environment configuration
  • URL detection (local/tunnel)
  • Assertion helpers
  • Timing utilities

Test Structure

Basic Test Pattern

async function testCreateTaskFlow(
  client: Client, 
  reporter: TestReporter
): Promise<void> {
  // 1. Setup
  const timestamp = Date.now();
  const branchName = `e2e-test-${timestamp}`;
  
  // 2. Execute
  const result = await client.callTool({
    name: 'create_task',
    arguments: {
      tool: 'CLAUDECODE',
      branch: branchName,
      instructions: 'Create hello.html'
    }
  });
  
  // 3. Verify
  if (result.content?.[0]?.text?.includes('created')) {
    reporter.addSuccess('Task created successfully');
  } else {
    reporter.addError('Task creation failed');
  }
  
  // 4. Cleanup
  await client.callTool({
    name: 'end_task',
    arguments: { task_id: taskId }
  });
}

Notification Handling

// Set up notification handlers
client.setNotificationHandler(
  ResourceUpdatedNotificationSchema,
  async (notification) => {
    const { uri } = notification.params;
    
    // React to task updates
    if (uri.startsWith('task://')) {
      const resource = await client.readResource({ uri });
      const task = JSON.parse(resource.contents[0].text);
      
      // Track progress
      reporter.addLog(
        taskId,
        `Status: ${task.status}, Progress: ${task.progress}%`
      );
    }
  }
);

Running Tests

Local Testing

# Run against local server
npm run test:e2e

Tunnel Testing

# Terminal 1: Start server with tunnel
npm run tunnel

# Terminal 2: Run tests against tunnel
npm run test:tunnel

Environment Variables

# .env configuration
MCP_BASE_URL=http://localhost:3000  # Override base URL
TUNNEL_MODE=true                     # Enable tunnel detection
TEST_TIMEOUT=120000                  # Test timeout (ms)

Test Reports

HTML Report Features

  • Summary Dashboard - Pass/fail statistics
  • Timeline View - Execution timeline
  • Detailed Logs - Step-by-step execution
  • Notification History - All MCP notifications
  • Error Details - Stack traces and context

Report Location

e2e-test/typescript/test-reports/
├── report-2024-12-20T10-30-45.html
├── report-2024-12-20T10-30-45.md
└── latest.html -> report-2024-12-20T10-30-45.html

Writing New Tests

1. Create Test Function

async function testNewFeature(
  client: Client,
  reporter: TestReporter
): Promise<void> {
  const test = reporter.startTest('New Feature Test');
  
  try {
    // Your test logic here
    test.pass('Feature works correctly');
  } catch (error) {
    test.fail(`Feature failed: ${error.message}`);
    throw error;
  }
}

2. Add to Test Suite

// In test-e2e.ts
const tests = [
  testCreateTaskFlow,
  testNewFeature,  // Add your test
  // ... other tests
];

3. Use Test Utilities

import { 
  createMCPClient, 
  log, 
  sleep,
  waitForCondition 
} from './utils/test-utils.js';

// Wait for task completion
await waitForCondition(
  async () => {
    const task = await getTask(taskId);
    return task.status === 'completed';
  },
  { timeout: 60000, interval: 2000 }
);

Best Practices

1. Test Isolation

  • Use unique branch names with timestamps
  • Clean up resources after tests
  • Don't depend on previous test state

2. Timeout Management

  • Set appropriate timeouts for AI operations
  • Use shorter timeouts for quick operations
  • Implement retry logic for flaky operations

3. Assertion Strategy

  • Verify both success responses and side effects
  • Check resource states match expectations
  • Validate notification sequences

4. Error Handling

  • Catch and report all errors
  • Include context in error messages
  • Clean up even on failure

5. Reporting

  • Log all significant events
  • Include timing information
  • Capture notification data

Common Test Scenarios

1. Task Creation and Completion

// Create task
const createResult = await client.callTool({
  name: 'create_task',
  arguments: {
    tool: 'CLAUDECODE',
    instructions: 'Implement authentication'
  }
});

// Wait for completion
await waitForTaskCompletion(client, taskId);

// Verify results
const task = await client.readResource({
  uri: `task://${taskId}`
});

2. Progress Monitoring

// Track progress updates
const progressUpdates: number[] = [];

client.setNotificationHandler(
  ResourceUpdatedNotificationSchema,
  (notif) => {
    if (notif.params.uri === `task://${taskId}`) {
      const task = JSON.parse(/* ... */);
      progressUpdates.push(task.progress);
    }
  }
);

// Verify progress increments
expect(progressUpdates).toEqual([0, 25, 50, 75, 100]);

3. Error Scenarios

// Test invalid inputs
const errorResult = await client.callTool({
  name: 'create_task',
  arguments: {
    tool: 'INVALID_TOOL',
    instructions: 'Test error'
  }
});

expect(errorResult.isError).toBe(true);
expect(errorResult.content[0].text).toContain('error');

Debugging Tests

Enable Verbose Logging

// Set debug level
process.env.LOG_LEVEL = 'debug';

// Add custom logging
log.debug('Task state:', taskState);
log.info('Notification received:', notification);

Inspect MCP Traffic

// Log all MCP requests/responses
client.on('request', (req) => {
  console.log('MCP Request:', JSON.stringify(req, null, 2));
});

client.on('response', (res) => {
  console.log('MCP Response:', JSON.stringify(res, null, 2));
});

Save Test Artifacts

// Save task details for debugging
const taskDetails = await client.readResource({
  uri: `task://${taskId}`
});

fs.writeFileSync(
  `test-artifacts/task-${taskId}.json`,
  taskDetails.contents[0].text
);

CI/CD Integration

GitHub Actions Example

name: E2E Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Node.js
        uses: actions/setup-node@v3
        with:
          node-version: '18'
      
      - name: Install dependencies
        run: npm ci
      
      - name: Start server
        run: |
          docker-compose up -d
          npm run wait-for-ready
      
      - name: Run E2E tests
        run: npm run test:e2e
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      
      - name: Upload reports
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: test-reports
          path: e2e-test/typescript/test-reports/

Performance Testing

Measure Operation Times

const timer = reporter.startTimer('Task Creation');
const result = await client.callTool({
  name: 'create_task',
  arguments: { /* ... */ }
});
timer.end();

reporter.addMetric('task_creation_time', timer.duration);

Load Testing

// Parallel task creation
const tasks = await Promise.all(
  Array(10).fill(0).map((_, i) => 
    client.callTool({
      name: 'create_task',
      arguments: {
        branch: `load-test-${i}`,
        instructions: 'Simple task'
      }
    })
  )
);

// Measure throughput
reporter.addMetric('tasks_per_second', 10 / elapsedSeconds);