Testing Framework
July 1, 2025 · View on GitHub
Overview
The SystemPrompt Coding Agent includes a comprehensive End-to-End (E2E) testing framework built specifically for testing MCP (Model Context Protocol) servers. The framework validates the complete flow of AI agent orchestration, from task creation through completion.
Architecture
Test Runner
│
├── MCP Client
│ ├── HTTP Transport
│ └── Notification Handlers
│
├── Test Reporter
│ ├── HTML Reports
│ └── Markdown Reports
│
└── Test Utils
├── Environment Detection
├── Logging
└── Assertions
Core Components
1. Test Runner
Main test orchestration and execution.
Features:
- Sequential test execution
- Error handling and recovery
- Timeout management
- Result aggregation
2. MCP Client Integration
Full MCP protocol client for testing.
Capabilities:
- Tool invocation
- Resource reading
- Notification handling
- Progress tracking
3. Test Reporter
Comprehensive test reporting system.
Output Formats:
- HTML Reports - Interactive, styled reports
- Markdown Reports - Git-friendly text reports
- Console Output - Real-time test progress
4. Test Utilities
Helper functions and common patterns.
Includes:
- Environment configuration
- URL detection (local/tunnel)
- Assertion helpers
- Timing utilities
Test Structure
Basic Test Pattern
async function testCreateTaskFlow(
client: Client,
reporter: TestReporter
): Promise<void> {
// 1. Setup
const timestamp = Date.now();
const branchName = `e2e-test-${timestamp}`;
// 2. Execute
const result = await client.callTool({
name: 'create_task',
arguments: {
tool: 'CLAUDECODE',
branch: branchName,
instructions: 'Create hello.html'
}
});
// 3. Verify
if (result.content?.[0]?.text?.includes('created')) {
reporter.addSuccess('Task created successfully');
} else {
reporter.addError('Task creation failed');
}
// 4. Cleanup
await client.callTool({
name: 'end_task',
arguments: { task_id: taskId }
});
}
Notification Handling
// Set up notification handlers
client.setNotificationHandler(
ResourceUpdatedNotificationSchema,
async (notification) => {
const { uri } = notification.params;
// React to task updates
if (uri.startsWith('task://')) {
const resource = await client.readResource({ uri });
const task = JSON.parse(resource.contents[0].text);
// Track progress
reporter.addLog(
taskId,
`Status: ${task.status}, Progress: ${task.progress}%`
);
}
}
);
Running Tests
Local Testing
# Run against local server
npm run test:e2e
Tunnel Testing
# Terminal 1: Start server with tunnel
npm run tunnel
# Terminal 2: Run tests against tunnel
npm run test:tunnel
Environment Variables
# .env configuration
MCP_BASE_URL=http://localhost:3000 # Override base URL
TUNNEL_MODE=true # Enable tunnel detection
TEST_TIMEOUT=120000 # Test timeout (ms)
Test Reports
HTML Report Features
- Summary Dashboard - Pass/fail statistics
- Timeline View - Execution timeline
- Detailed Logs - Step-by-step execution
- Notification History - All MCP notifications
- Error Details - Stack traces and context
Report Location
e2e-test/typescript/test-reports/
├── report-2024-12-20T10-30-45.html
├── report-2024-12-20T10-30-45.md
└── latest.html -> report-2024-12-20T10-30-45.html
Writing New Tests
1. Create Test Function
async function testNewFeature(
client: Client,
reporter: TestReporter
): Promise<void> {
const test = reporter.startTest('New Feature Test');
try {
// Your test logic here
test.pass('Feature works correctly');
} catch (error) {
test.fail(`Feature failed: ${error.message}`);
throw error;
}
}
2. Add to Test Suite
// In test-e2e.ts
const tests = [
testCreateTaskFlow,
testNewFeature, // Add your test
// ... other tests
];
3. Use Test Utilities
import {
createMCPClient,
log,
sleep,
waitForCondition
} from './utils/test-utils.js';
// Wait for task completion
await waitForCondition(
async () => {
const task = await getTask(taskId);
return task.status === 'completed';
},
{ timeout: 60000, interval: 2000 }
);
Best Practices
1. Test Isolation
- Use unique branch names with timestamps
- Clean up resources after tests
- Don't depend on previous test state
2. Timeout Management
- Set appropriate timeouts for AI operations
- Use shorter timeouts for quick operations
- Implement retry logic for flaky operations
3. Assertion Strategy
- Verify both success responses and side effects
- Check resource states match expectations
- Validate notification sequences
4. Error Handling
- Catch and report all errors
- Include context in error messages
- Clean up even on failure
5. Reporting
- Log all significant events
- Include timing information
- Capture notification data
Common Test Scenarios
1. Task Creation and Completion
// Create task
const createResult = await client.callTool({
name: 'create_task',
arguments: {
tool: 'CLAUDECODE',
instructions: 'Implement authentication'
}
});
// Wait for completion
await waitForTaskCompletion(client, taskId);
// Verify results
const task = await client.readResource({
uri: `task://${taskId}`
});
2. Progress Monitoring
// Track progress updates
const progressUpdates: number[] = [];
client.setNotificationHandler(
ResourceUpdatedNotificationSchema,
(notif) => {
if (notif.params.uri === `task://${taskId}`) {
const task = JSON.parse(/* ... */);
progressUpdates.push(task.progress);
}
}
);
// Verify progress increments
expect(progressUpdates).toEqual([0, 25, 50, 75, 100]);
3. Error Scenarios
// Test invalid inputs
const errorResult = await client.callTool({
name: 'create_task',
arguments: {
tool: 'INVALID_TOOL',
instructions: 'Test error'
}
});
expect(errorResult.isError).toBe(true);
expect(errorResult.content[0].text).toContain('error');
Debugging Tests
Enable Verbose Logging
// Set debug level
process.env.LOG_LEVEL = 'debug';
// Add custom logging
log.debug('Task state:', taskState);
log.info('Notification received:', notification);
Inspect MCP Traffic
// Log all MCP requests/responses
client.on('request', (req) => {
console.log('MCP Request:', JSON.stringify(req, null, 2));
});
client.on('response', (res) => {
console.log('MCP Response:', JSON.stringify(res, null, 2));
});
Save Test Artifacts
// Save task details for debugging
const taskDetails = await client.readResource({
uri: `task://${taskId}`
});
fs.writeFileSync(
`test-artifacts/task-${taskId}.json`,
taskDetails.contents[0].text
);
CI/CD Integration
GitHub Actions Example
name: E2E Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '18'
- name: Install dependencies
run: npm ci
- name: Start server
run: |
docker-compose up -d
npm run wait-for-ready
- name: Run E2E tests
run: npm run test:e2e
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Upload reports
if: always()
uses: actions/upload-artifact@v3
with:
name: test-reports
path: e2e-test/typescript/test-reports/
Performance Testing
Measure Operation Times
const timer = reporter.startTimer('Task Creation');
const result = await client.callTool({
name: 'create_task',
arguments: { /* ... */ }
});
timer.end();
reporter.addMetric('task_creation_time', timer.duration);
Load Testing
// Parallel task creation
const tasks = await Promise.all(
Array(10).fill(0).map((_, i) =>
client.callTool({
name: 'create_task',
arguments: {
branch: `load-test-${i}`,
instructions: 'Simple task'
}
})
)
);
// Measure throughput
reporter.addMetric('tasks_per_second', 10 / elapsedSeconds);