Agent Communication Monitoring & Streaming
November 16, 2025 · View on GitHub
Overview
The SWORDSwarm system provides real-time monitoring of agent-to-agent communication through the Communication Monitor tool. This allows you to observe, analyze, and debug the 4.2M msg/sec binary communication layer in human-readable format.
Features
- ✅ Live Message Streaming - Real-time display of agent communications
- ✅ Binary-to-Human Translation - Automatic translation from UFP binary to readable format
- ✅ Filtering - Filter by agent, division, priority, or message type
- ✅ Performance Metrics - Track message rates, agent activity, throughput
- ✅ Message Logging - Save communication logs for replay and analysis
- ✅ History Replay - Replay saved logs with filtering
Quick Start
Basic Usage
# Stream all agent communications
python3 tools/communication_monitor.py
Filtered Streaming
# Filter by specific agent
python3 tools/communication_monitor.py --agent RUST-INTERNAL-AGENT
# Filter by priority
python3 tools/communication_monitor.py --priority HIGH
# Filter by division
python3 tools/communication_monitor.py --division security
Logging & Replay
# Stream and save to log
python3 tools/communication_monitor.py --output agent_comms.log
# Replay saved log
python3 tools/communication_monitor.py --replay agent_comms.log
# Replay with filter
python3 tools/communication_monitor.py --replay agent_comms.log --agent SECURITY
Output Format
The monitor displays messages in a color-coded, human-readable format:
[14:32:15.234] HIGH TASK_REQUEST
DIRECTOR → ARCHITECT
{'task': 'Design microservice architecture', 'priority': 'high'}
[14:32:15.456] HIGH TASK_RESPONSE
ARCHITECT → DIRECTOR
{'status': 'completed', 'result': 'Architecture designed'}
[14:32:16.123] MEDIUM TASK_REQUEST
ARCHITECT → CONSTRUCTOR
{'task': 'Initialize project structure'}
[14:32:16.789] MEDIUM FEEDBACK
PYTHON-INTERNAL → CONSTRUCTOR
{'message': 'Need clarification on module structure'}
Color Coding
Priority:
- 🔴 CRITICAL - Bright red
- 🟡 HIGH - Yellow
- 🔵 MEDIUM - Blue
- ⚪ LOW - Gray
Message Type:
- 🔷 TASK_REQUEST - Cyan (new task assignment)
- 🟢 TASK_RESPONSE - Green (task completion)
- 🟣 FEEDBACK - Magenta (iteration/revision request)
- ⚪ STATUS_UPDATE - White (progress update)
- 🟢 AGENT_READY - Green (agent initialization)
Understanding the Communication Flow
Example: Feature Implementation
Here's a typical communication sequence for implementing a Rust microservice:
1. [DIRECTOR → ARCHITECT] TASK_REQUEST (HIGH)
"Design Rust microservice architecture"
2. [ARCHITECT → DIRECTOR] TASK_RESPONSE (HIGH)
"Architecture designed: 3-tier with gRPC"
3. [ARCHITECT → CONSTRUCTOR] TASK_REQUEST (MEDIUM)
"Initialize Rust project with template"
4. [CONSTRUCTOR → RUST-INTERNAL-AGENT] TASK_REQUEST (MEDIUM)
"Create Rust project structure"
5. [RUST-INTERNAL-AGENT → CONSTRUCTOR] TASK_RESPONSE (MEDIUM)
"Project initialized with Cargo.toml"
6. [CONSTRUCTOR → SECURITY] TASK_REQUEST (HIGH)
"Review security of project structure"
7. [SECURITY → CONSTRUCTOR] TASK_RESPONSE (HIGH)
"Security review passed"
8. [CONSTRUCTOR → TESTBED] TASK_REQUEST (MEDIUM)
"Generate initial test suite"
9. [TESTBED → CONSTRUCTOR] TASK_RESPONSE (MEDIUM)
"15 tests generated, 100% passing"
10. [CONSTRUCTOR → DIRECTOR] TASK_RESPONSE (MEDIUM)
"Project initialized successfully"
Chain of Command Example
Observe how work flows through the organizational hierarchy:
[Worker Request]
RUST-INTERNAL-AGENT → CONSTRUCTOR (Team Lead)
"Need approval for memory allocator change"
[Team Lead Review]
CONSTRUCTOR → ARCHITECT (Division Head)
"Worker requests allocator change, forwarding for approval"
[Division Approval]
ARCHITECT → CONSTRUCTOR
"Approved - proceed with change"
[Delegation]
CONSTRUCTOR → RUST-INTERNAL-AGENT
"Allocation approved, proceed with implementation"
[Completion]
RUST-INTERNAL-AGENT → CONSTRUCTOR
"Allocator changed, benchmarks show 15% improvement"
Special Security Reporting
Monitor CSO-direct reporting for security independence:
[Security Chaos Testing - CSO Direct]
CHAOS-AGENT → CSO (bypasses all other management)
"Chaos test: simulating network partition"
[CSO Oversight]
CSO → CHAOS-AGENT
"Proceed with test, report findings"
[Test Results]
CHAOS-AGENT → CSO
"System maintained availability under partition"
Notice how CHAOS-AGENT never communicates with operational divisions, maintaining test independence.
Performance Metrics
When you stop the monitor (Ctrl+C), it displays statistics:
================================================================================
STATISTICS
================================================================================
Messages: 1,247
Duration: 30.5s
Rate: 40.9 msg/sec
Agents seen: 15
Agents:
- ARCHITECT
- CHAOS-AGENT
- CONSTRUCTOR
- CSO
- DEPLOYER
- DIRECTOR
- PYTHON-INTERNAL
- RUST-INTERNAL-AGENT
- SECURITY
- TESTBED
Advanced Usage
Filter by Multiple Criteria
Combine filters for targeted monitoring:
# High priority security messages only
python3 tools/communication_monitor.py \
--division security \
--priority HIGH \
--output security_high.log
Continuous Logging
Run in background and tail the log:
# Start monitoring in background
nohup python3 tools/communication_monitor.py --output comms.log &
# Tail the log
tail -f comms.log | grep CRITICAL # Only show critical messages
Performance Analysis
Analyze message rates over time:
# Log for 5 minutes
timeout 300 python3 tools/communication_monitor.py --output perf.log
# Analyze
echo "Total messages:"
wc -l perf.log
echo "Messages per agent:"
cat perf.log | jq -r '.source_agent' | sort | uniq -c | sort -rn
echo "Average priority:"
cat perf.log | jq -r '.priority' | sort | uniq -c
Integration with Binary Protocol
The monitor transparently translates between the ultra-fast binary protocol (UFP) and human-readable JSON:
Binary Layer (UFP):
- 4.2M messages/second throughput
- <200ns P99 latency
- Zero-copy shared memory
Monitor Translation:
- Real-time binary→JSON conversion
- No impact on production performance
- Optional: can tap into message bus without interfering
Debugging with the Monitor
Scenario 1: Task Not Completing
Problem: Task stuck, not completing
Debug:
python3 tools/communication_monitor.py --agent PROBLEM-AGENT
Look for:
- Is agent receiving tasks? (TASK_REQUEST)
- Is agent responding? (TASK_RESPONSE)
- Is agent requesting feedback? (FEEDBACK)
- Where is it stuck in the chain?
Scenario 2: Performance Bottleneck
Problem: Slow task execution
Debug:
python3 tools/communication_monitor.py --output debug.log
# Let run for representative workload
# Ctrl+C to stop
# Analyze message counts
cat debug.log | jq -r '.target_agent' | sort | uniq -c | sort -rn
Look for:
- Which agent is receiving the most messages?
- Are there retry loops? (same message repeated)
- Are tasks being delegated properly?
Scenario 3: Security Test Contamination
Problem: Security tests affected by operational priorities
Debug:
python3 tools/communication_monitor.py \
--agent CHAOS-AGENT \
--output chaos_comms.log
Verify:
- All CHAOS-AGENT messages should go to/from CSO only
- No messages to/from DIRECTOR, ARCHITECT, etc.
- If you see operational messages → contamination detected
API Integration
You can also programmatically monitor communications:
from tools.communication_monitor import CommunicationMonitor
# Create monitor
monitor = CommunicationMonitor(
filter_agent="RUST-INTERNAL-AGENT",
filter_priority="HIGH",
output_file="rust_high.log"
)
# Stream messages
import asyncio
asyncio.run(monitor.stream_messages())
Message Format
Each message contains:
{
"message_id": "msg_12345",
"timestamp": 1700000000.123,
"source_agent": "RUST-INTERNAL-AGENT",
"target_agent": "CONSTRUCTOR",
"type": "TASK_RESPONSE",
"priority": "MEDIUM",
"payload": {
"status": "completed",
"result": "Implementation finished",
"metrics": {
"execution_time_ms": 1234,
"success": true
}
}
}
Best Practices
During Development
# Monitor your development agents
python3 tools/communication_monitor.py \
--division software_engineering \
--output dev.log
During Testing
# Monitor test execution
python3 tools/communication_monitor.py \
--agent TESTBED \
--priority HIGH \
--output tests.log
During Security Audits
# Monitor security operations
python3 tools/communication_monitor.py \
--division security \
--output security_audit.log
In Production
# Monitor critical messages only
python3 tools/communication_monitor.py \
--priority CRITICAL \
--output production_critical.log
# Set up alerting
watch -n 10 'tail -1 production_critical.log'
Troubleshooting
Monitor Shows "SIMULATION mode"
Cause: Live communication system not detected
Solution:
- System falls back to simulation for demonstration
- To use live system, ensure orchestrator is running
- Check that binary communication layer is initialized
No Messages Appearing
Cause: Filters too restrictive or no agent activity
Solution:
- Remove filters:
python3 tools/communication_monitor.py - Check that agents are active
- Verify communication system is running
Too Many Messages
Cause: High agent activity
Solution:
- Add filters:
--priority HIGHor--agent SPECIFIC-AGENT - Use file logging:
--output msgs.logand analyze offline - Monitor specific divisions only
Performance Impact
The monitor has minimal performance impact:
- Read-only - Does not interfere with message delivery
- Async - Non-blocking translation and display
- Optional - Can be disabled completely in production
Typical impact: <0.5% CPU, <100MB RAM
See Also
- Dynamic Agent Communication - Communication system architecture
- Organizational Hierarchy - Understanding agent relationships
- Accurate Agent Mapping - Complete 88-agent mapping
- Expected Performance Boosts - Performance improvements
Tool: tools/communication_monitor.py
Status: ✅ Production-ready
Last Updated: 2025-11-16
Version: 3.0.0