Agent Communication Monitoring & Streaming

November 16, 2025 · View on GitHub

Overview

The SWORDSwarm system provides real-time monitoring of agent-to-agent communication through the Communication Monitor tool. This allows you to observe, analyze, and debug the 4.2M msg/sec binary communication layer in human-readable format.

Features

✅ Live Message Streaming - Real-time display of agent communications
✅ Binary-to-Human Translation - Automatic translation from UFP binary to readable format
✅ Filtering - Filter by agent, division, priority, or message type
✅ Performance Metrics - Track message rates, agent activity, throughput
✅ Message Logging - Save communication logs for replay and analysis
✅ History Replay - Replay saved logs with filtering

Quick Start

Basic Usage

# Stream all agent communications
python3 tools/communication_monitor.py

Filtered Streaming

# Filter by specific agent
python3 tools/communication_monitor.py --agent RUST-INTERNAL-AGENT

# Filter by priority
python3 tools/communication_monitor.py --priority HIGH

# Filter by division
python3 tools/communication_monitor.py --division security

Logging & Replay

# Stream and save to log
python3 tools/communication_monitor.py --output agent_comms.log

# Replay saved log
python3 tools/communication_monitor.py --replay agent_comms.log

# Replay with filter
python3 tools/communication_monitor.py --replay agent_comms.log --agent SECURITY

Output Format

The monitor displays messages in a color-coded, human-readable format:

[14:32:15.234] HIGH     TASK_REQUEST
  DIRECTOR → ARCHITECT
  {'task': 'Design microservice architecture', 'priority': 'high'}

[14:32:15.456] HIGH     TASK_RESPONSE
  ARCHITECT → DIRECTOR
  {'status': 'completed', 'result': 'Architecture designed'}

[14:32:16.123] MEDIUM   TASK_REQUEST
  ARCHITECT → CONSTRUCTOR
  {'task': 'Initialize project structure'}

[14:32:16.789] MEDIUM   FEEDBACK
  PYTHON-INTERNAL → CONSTRUCTOR
  {'message': 'Need clarification on module structure'}

Color Coding

Priority:

🔴 CRITICAL - Bright red
🟡 HIGH - Yellow
🔵 MEDIUM - Blue
⚪ LOW - Gray

Message Type:

🔷 TASK_REQUEST - Cyan (new task assignment)
🟢 TASK_RESPONSE - Green (task completion)
🟣 FEEDBACK - Magenta (iteration/revision request)
⚪ STATUS_UPDATE - White (progress update)
🟢 AGENT_READY - Green (agent initialization)

Understanding the Communication Flow

Example: Feature Implementation

Here's a typical communication sequence for implementing a Rust microservice:

1. [DIRECTOR → ARCHITECT] TASK_REQUEST (HIGH)
   "Design Rust microservice architecture"

2. [ARCHITECT → DIRECTOR] TASK_RESPONSE (HIGH)
   "Architecture designed: 3-tier with gRPC"

3. [ARCHITECT → CONSTRUCTOR] TASK_REQUEST (MEDIUM)
   "Initialize Rust project with template"

4. [CONSTRUCTOR → RUST-INTERNAL-AGENT] TASK_REQUEST (MEDIUM)
   "Create Rust project structure"

5. [RUST-INTERNAL-AGENT → CONSTRUCTOR] TASK_RESPONSE (MEDIUM)
   "Project initialized with Cargo.toml"

6. [CONSTRUCTOR → SECURITY] TASK_REQUEST (HIGH)
   "Review security of project structure"

7. [SECURITY → CONSTRUCTOR] TASK_RESPONSE (HIGH)
   "Security review passed"

8. [CONSTRUCTOR → TESTBED] TASK_REQUEST (MEDIUM)
   "Generate initial test suite"

9. [TESTBED → CONSTRUCTOR] TASK_RESPONSE (MEDIUM)
   "15 tests generated, 100% passing"

10. [CONSTRUCTOR → DIRECTOR] TASK_RESPONSE (MEDIUM)
    "Project initialized successfully"

Chain of Command Example

Observe how work flows through the organizational hierarchy:

[Worker Request]
RUST-INTERNAL-AGENT → CONSTRUCTOR (Team Lead)
  "Need approval for memory allocator change"

[Team Lead Review]
CONSTRUCTOR → ARCHITECT (Division Head)
  "Worker requests allocator change, forwarding for approval"

[Division Approval]
ARCHITECT → CONSTRUCTOR
  "Approved - proceed with change"

[Delegation]
CONSTRUCTOR → RUST-INTERNAL-AGENT
  "Allocation approved, proceed with implementation"

[Completion]
RUST-INTERNAL-AGENT → CONSTRUCTOR
  "Allocator changed, benchmarks show 15% improvement"

Special Security Reporting

Monitor CSO-direct reporting for security independence:

[Security Chaos Testing - CSO Direct]
CHAOS-AGENT → CSO (bypasses all other management)
  "Chaos test: simulating network partition"

[CSO Oversight]
CSO → CHAOS-AGENT
  "Proceed with test, report findings"

[Test Results]
CHAOS-AGENT → CSO
  "System maintained availability under partition"

Notice how CHAOS-AGENT never communicates with operational divisions, maintaining test independence.

Performance Metrics

When you stop the monitor (Ctrl+C), it displays statistics:

================================================================================
STATISTICS
================================================================================
Messages:     1,247
Duration:     30.5s
Rate:         40.9 msg/sec
Agents seen:  15

Agents:
  - ARCHITECT
  - CHAOS-AGENT
  - CONSTRUCTOR
  - CSO
  - DEPLOYER
  - DIRECTOR
  - PYTHON-INTERNAL
  - RUST-INTERNAL-AGENT
  - SECURITY
  - TESTBED

Advanced Usage

Filter by Multiple Criteria

Combine filters for targeted monitoring:

# High priority security messages only
python3 tools/communication_monitor.py \
  --division security \
  --priority HIGH \
  --output security_high.log

Continuous Logging

Run in background and tail the log:

# Start monitoring in background
nohup python3 tools/communication_monitor.py --output comms.log &

# Tail the log
tail -f comms.log | grep CRITICAL  # Only show critical messages

Performance Analysis

Analyze message rates over time:

# Log for 5 minutes
timeout 300 python3 tools/communication_monitor.py --output perf.log

# Analyze
echo "Total messages:"
wc -l perf.log

echo "Messages per agent:"
cat perf.log | jq -r '.source_agent' | sort | uniq -c | sort -rn

echo "Average priority:"
cat perf.log | jq -r '.priority' | sort | uniq -c

Integration with Binary Protocol

The monitor transparently translates between the ultra-fast binary protocol (UFP) and human-readable JSON:

Binary Layer (UFP):

4.2M messages/second throughput
<200ns P99 latency
Zero-copy shared memory

Monitor Translation:

Real-time binary→JSON conversion
No impact on production performance
Optional: can tap into message bus without interfering

Debugging with the Monitor

Scenario 1: Task Not Completing

Problem: Task stuck, not completing

Debug:

python3 tools/communication_monitor.py --agent PROBLEM-AGENT

Look for:

Is agent receiving tasks? (TASK_REQUEST)
Is agent responding? (TASK_RESPONSE)
Is agent requesting feedback? (FEEDBACK)
Where is it stuck in the chain?

Scenario 2: Performance Bottleneck

Problem: Slow task execution

Debug:

python3 tools/communication_monitor.py --output debug.log
# Let run for representative workload
# Ctrl+C to stop

# Analyze message counts
cat debug.log | jq -r '.target_agent' | sort | uniq -c | sort -rn

Look for:

Which agent is receiving the most messages?
Are there retry loops? (same message repeated)
Are tasks being delegated properly?

Scenario 3: Security Test Contamination

Problem: Security tests affected by operational priorities

Debug:

python3 tools/communication_monitor.py \
  --agent CHAOS-AGENT \
  --output chaos_comms.log

Verify:

All CHAOS-AGENT messages should go to/from CSO only
No messages to/from DIRECTOR, ARCHITECT, etc.
If you see operational messages → contamination detected

API Integration

You can also programmatically monitor communications:

from tools.communication_monitor import CommunicationMonitor

# Create monitor
monitor = CommunicationMonitor(
    filter_agent="RUST-INTERNAL-AGENT",
    filter_priority="HIGH",
    output_file="rust_high.log"
)

# Stream messages
import asyncio
asyncio.run(monitor.stream_messages())

Message Format

Each message contains:

{
  "message_id": "msg_12345",
  "timestamp": 1700000000.123,
  "source_agent": "RUST-INTERNAL-AGENT",
  "target_agent": "CONSTRUCTOR",
  "type": "TASK_RESPONSE",
  "priority": "MEDIUM",
  "payload": {
    "status": "completed",
    "result": "Implementation finished",
    "metrics": {
      "execution_time_ms": 1234,
      "success": true
    }
  }
}

Best Practices

During Development

# Monitor your development agents
python3 tools/communication_monitor.py \
  --division software_engineering \
  --output dev.log

During Testing

# Monitor test execution
python3 tools/communication_monitor.py \
  --agent TESTBED \
  --priority HIGH \
  --output tests.log

During Security Audits

# Monitor security operations
python3 tools/communication_monitor.py \
  --division security \
  --output security_audit.log

In Production

# Monitor critical messages only
python3 tools/communication_monitor.py \
  --priority CRITICAL \
  --output production_critical.log

# Set up alerting
watch -n 10 'tail -1 production_critical.log'

Troubleshooting

Monitor Shows "SIMULATION mode"

Cause: Live communication system not detected

Solution:

System falls back to simulation for demonstration
To use live system, ensure orchestrator is running
Check that binary communication layer is initialized

No Messages Appearing

Cause: Filters too restrictive or no agent activity

Solution:

Remove filters: python3 tools/communication_monitor.py
Check that agents are active
Verify communication system is running

Too Many Messages

Cause: High agent activity

Solution:

Add filters: --priority HIGH or --agent SPECIFIC-AGENT
Use file logging: --output msgs.log and analyze offline
Monitor specific divisions only

Performance Impact

The monitor has minimal performance impact:

Read-only - Does not interfere with message delivery
Async - Non-blocking translation and display
Optional - Can be disabled completely in production

Typical impact: <0.5% CPU, <100MB RAM