Zeronode Architecture Guide
November 26, 2025 · View on GitHub
Overview
Zeronode is a layered microservices framework built on ZeroMQ, providing a clean abstraction for building distributed systems. This guide explains the architecture, event flow, and design decisions.
Architecture Layers
┌─────────────────────────────────────────────────────────────────┐
│ APPLICATION LAYER │
│ (Your Business Logic) │
├─────────────────────────────────────────────────────────────────┤
│ NODE │
│ • Mesh network orchestration (N clients + 1 server) │
│ • Peer state management (joined/left) │
│ • Smart routing (by ID, filter, broadcast) │
│ • Central handler registry │
│ • NodeEvent: PEER_JOINED, PEER_LEFT, READY, ERROR │
├─────────────────────────────────────────────────────────────────┤
│ SERVER CLIENT │
│ • Router socket wrapper • Dealer socket wrapper │
│ • Health checks • Ping mechanism │
│ • Client discovery • Handshake initiation │
│ • ServerEvent: CLIENT_ • ClientEvent: SERVER_ │
│ JOINED, CLIENT_LEFT JOINED, SERVER_LEFT │
├─────────────────────────────────────────────────────────────────┤
│ PROTOCOL │
│ • Message routing (request/reply, tick) │
│ • Envelope management (serialization/deserialization) │
│ • Handler management (PatternEmitter) │
│ • Request tracking (timeouts, promises) │
│ • System events (handshake, ping, stop) │
│ • ProtocolEvent: TRANSPORT_READY, TRANSPORT_NOT_READY │
├─────────────────────────────────────────────────────────────────┤
│ TRANSPORT │
│ • ZeroMQ socket abstraction (Router/Dealer) │
│ • Connection management │
│ • Buffer send/receive │
│ • Transport lifecycle (bind, connect, close) │
│ • TransportEvent: READY, NOT_READY, CLOSED, MESSAGE │
└─────────────────────────────────────────────────────────────────┘
Event Flow: The Complete Picture
1. Transport Layer Events
Transport emits:
TransportEvent.READY- Socket can send/receiveTransportEvent.NOT_READY- Socket lost connectionTransportEvent.CLOSED- Socket permanently closedTransportEvent.MESSAGE- Received message buffer
Key characteristic: Transport layer is connection-oriented (especially for Dealer/client sockets).
2. Protocol Layer Events
Protocol listens to Transport and emits:
ProtocolEvent.TRANSPORT_READY- Bubbled from Transport.READYProtocolEvent.TRANSPORT_NOT_READY- Bubbled from Transport.NOT_READYProtocolEvent.TRANSPORT_CLOSED- Bubbled from Transport.CLOSEDProtocolEvent.ERROR- Protocol-level errors
Protocol also handles:
- System messages (handshake, ping, stop)
- Application messages (requests, ticks, replies)
- Request tracking and timeouts
3. Server Layer Events
Server listens to Protocol and emits:
// FROM PROTOCOL
ProtocolEvent.TRANSPORT_READY → ServerEvent.READY
ProtocolEvent.TRANSPORT_NOT_READY → ServerEvent.NOT_READY
ProtocolEvent.TRANSPORT_CLOSED → ServerEvent.CLOSED
// FROM APPLICATION LOGIC (Message-Based Discovery)
HANDSHAKE_INIT_FROM_CLIENT → ServerEvent.CLIENT_JOINED
CLIENT_PING → (update lastSeen timestamp)
CLIENT_STOP → ServerEvent.CLIENT_LEFT
TIMEOUT → ServerEvent.CLIENT_LEFT (reason: 'TIMEOUT')
Server tracks clients via:
clientLastSeenMap (clientId → timestamp)- Health check interval (default: 30s)
- Ghost timeout (default: 60s)
4. Client Layer Events
Client listens to Protocol and emits:
// FROM PROTOCOL
ProtocolEvent.TRANSPORT_READY → ClientEvent.READY (then sends handshake)
ProtocolEvent.TRANSPORT_NOT_READY → ClientEvent.NOT_READY
ProtocolEvent.TRANSPORT_CLOSED → ClientEvent.CLOSED or NOT_READY
// FROM APPLICATION LOGIC (System Messages)
HANDSHAKE_ACK_FROM_SERVER → ClientEvent.SERVER_JOINED (starts ping)
SERVER_STOP → ClientEvent.SERVER_LEFT
Client tracks server via:
serverId(null until handshake complete)- Ping interval (default: 10s)
5. Node Layer Events
Node listens to Server/Client and emits:
// FROM SERVER
ServerEvent.CLIENT_JOINED → NodeEvent.PEER_JOINED (direction: 'downstream')
ServerEvent.CLIENT_LEFT → NodeEvent.PEER_LEFT (direction: 'downstream')
// FROM CLIENT
ClientEvent.SERVER_JOINED → NodeEvent.PEER_JOINED (direction: 'upstream')
ClientEvent.NOT_READY → NodeEvent.PEER_LEFT (direction: 'upstream')
ClientEvent.CLOSED → NodeEvent.PEER_LEFT (direction: 'upstream')
ClientEvent.SERVER_LEFT → NodeEvent.PEER_LEFT (direction: 'upstream')
Node tracks peers via:
joinedPeersSet (peerId → boolean)peerOptionsMap (peerId → options)peerDirectionMap (peerId → 'upstream' | 'downstream')
Complete Event Flow: Client Death Scenario
Let's trace what happens when a client dies (killed with Ctrl+C):
TIME LAYER EVENT ACTION
──── ───────── ───────────────────────────── ──────────────────────────
t=0 Process Client killed (Ctrl+C)
t=0 Transport TCP connection closes
(Client)
t=0 Transport Detects connection loss Emits: Transport.NOT_READY
(Client)
t=0 Protocol Receives Transport.NOT_READY Emits: Protocol.TRANSPORT_NOT_READY
(Client)
t=0 Client Receives Protocol.TRANSPORT_ Stops ping
NOT_READY Emits: Client.NOT_READY
t=0 Node Receives Client.NOT_READY Removes from joinedPeers
(Client) Emits: Node.PEER_LEFT
(direction: 'upstream')
─────────────────────────────────────────────────────────────────────────
Meanwhile, on the SERVER side...
t=0 Transport ZeroMQ Router socket... (NO EVENT - by design)
(Server)
t=2 Server Health check runs clientLastSeen: 2s ago (OK)
t=4 Server Health check runs clientLastSeen: 4s ago (OK)
t=6 Server Health check runs clientLastSeen: 6s ago (OK)
t=8 Server Health check runs clientLastSeen: 8s ago (OK)
t=10 Server Health check runs clientLastSeen: 10s ago (TIMEOUT!)
Deletes from clientLastSeen
Emits: Server.CLIENT_LEFT
(reason: 'TIMEOUT')
t=10 Node Receives Server.CLIENT_LEFT Removes from joinedPeers
(Server) Emits: Node.PEER_LEFT
(direction: 'downstream')
Key Design Decisions
1. Why Server Uses Timeout-Based Detection
ZeroMQ Router sockets (server) do NOT emit per-peer disconnect events. This is intentional:
- Message-oriented design: Router focuses on message routing, not connection tracking
- Multi-peer scalability: Tracking N connections would add overhead
- Transport independence: Works same for tcp://, ipc://, inproc://
Solution: Application-level heartbeating
- Standard pattern in all message-oriented systems
- RabbitMQ, Kafka, Redis all use this approach
- Configurable: balance between responsiveness and overhead
2. Why Client Gets Immediate Notification
ZeroMQ Dealer sockets (client) CAN detect server disconnect immediately:
- Single connection: Only talks to one server
- Connection-oriented: ZeroMQ can emit events for this use case
- Transport layer: Dealer socket gets TCP FIN/RST notifications
Result: Client-side disconnects are immediate (milliseconds)
3. State Management: Single Source of Truth
Node layer maintains THE authoritative peer state:
// In joinedPeers Set → routable
// NOT in joinedPeers Set → not routable
_addJoinedPeer(peerId) {
joinedPeers.add(peerId) // NOW routable
}
_removeJoinedPeer(peerId) {
joinedPeers.delete(peerId) // NOW not routable
}
Benefits:
- No querying Server/Client during routing (fast)
- No state divergence
- Clear semantics: in Set = online, not in Set = offline
4. Handshake Protocol
Client → Server handshake:
1. Client: TRANSPORT_READY → sends HANDSHAKE_INIT_FROM_CLIENT (with options)
2. Server: Receives handshake → stores clientId in clientLastSeen
3. Server: Emits CLIENT_JOINED → sends HANDSHAKE_ACK_FROM_SERVER (with options)
4. Client: Receives ack → stores serverId → starts ping
5. Client: Emits SERVER_JOINED
Why this design:
- Peer discovery: Server doesn't know clients until they announce
- Options exchange: Both peers learn each other's metadata
- Graceful: Works with any transport (tcp, ipc, inproc)
Configuration
Server Configuration
const server = new Node({
id: 'server-node',
config: {
CLIENT_HEALTH_CHECK_INTERVAL: 2000, // Check every 2 seconds
CLIENT_GHOST_TIMEOUT: 10000 // Timeout after 10 seconds
}
})
Client Configuration
const client = new Node({
id: 'client-node',
config: {
PING_INTERVAL: 2000, // Ping every 2 seconds
CLIENT_HANDSHAKE_TIMEOUT: 10000 // Handshake timeout
}
})
Timeout Tuning Guide
| Use Case | Ping Interval | Health Check | Timeout | Trade-off |
|---|---|---|---|---|
| Low latency | 1s | 1s | 3s | Fast detection, more traffic |
| Balanced | 2s | 2s | 10s | Good balance (recommended) |
| Efficient | 10s | 30s | 60s | Low overhead, slow detection |
Peer Lifecycle
Upstream Peer (Client connecting TO server)
1. client.connect({ address })
2. Transport connects → TRANSPORT_READY
3. Client sends handshake
4. Server receives → CLIENT_JOINED
5. Server sends ack
6. Client receives → SERVER_JOINED
7. Node emits PEER_JOINED (direction: 'upstream')
[... peer is active ...]
8. Disconnect (any reason)
9. Client emits NOT_READY/CLOSED/SERVER_LEFT
10. Node emits PEER_LEFT (direction: 'upstream')
Downstream Peer (Client connected FROM server)
1. Client connects to our server
2. Server receives handshake → CLIENT_JOINED
3. Node emits PEER_JOINED (direction: 'downstream')
[... peer is active, pings arrive ...]
4. Ping stops arriving (client died)
5. Health check timeout expires
6. Server emits CLIENT_LEFT (reason: 'TIMEOUT')
7. Node emits PEER_LEFT (direction: 'downstream')
Error Handling
Transport Errors
// Emitted by Protocol, bubbled to Node
node.on(NodeEvent.ERROR, ({ source, error }) => {
if (source === 'server') {
// Server transport error
} else if (source === 'client') {
// Client transport error
}
})
Application Errors
// NO_NODES_MATCH_FILTER - no peers match routing criteria
node.on('error', (err) => {
if (err.code === 'NO_NODES_MATCH_FILTER') {
console.log('No peers available for routing')
}
})
Best Practices
1. Always Handle PEER_LEFT
node.on(NodeEvent.PEER_LEFT, ({ peerId, direction, reason }) => {
console.log(`Peer ${peerId} left (${direction}): ${reason}`)
// Clean up any peer-specific resources
})
2. Track Connected Peers
const connectedPeers = new Set()
node.on(NodeEvent.PEER_JOINED, ({ peerId }) => {
connectedPeers.add(peerId)
})
node.on(NodeEvent.PEER_LEFT, ({ peerId }) => {
connectedPeers.delete(peerId)
})
3. Only Send When Peers Exist
if (connectedPeers.size > 0) {
node.tickAny({ event: 'heartbeat', data: { ... } })
}
4. Use Appropriate Timeouts
// For request/reply - use timeout
const response = await node.request({
to: 'peer-id',
event: 'operation',
data: payload,
timeout: 5000 // 5 second timeout
})
Performance Characteristics
Latency
- Request/Reply: ~0.3ms average (measured)
- Tick (fire-and-forget): < 0.1ms (no response tracking)
- Peer discovery: Immediate (message-based)
- Disconnect detection (server): Configurable (2-60 seconds)
- Disconnect detection (client): Immediate (< 100ms)
Overhead
- Per peer: Minimal (just tracking in Maps/Sets)
- Ping traffic: 1 message per interval per client
- Health check: Single timer per server
Summary
Zeronode provides a clean, layered architecture where:
- Transport handles raw socket connections
- Protocol handles message serialization and routing
- Server/Client handle lifecycle and peer management
- Node provides unified API and smart routing
The event flow is straightforward and predictable, with clear separation of concerns. Disconnect detection works differently for client vs. server due to ZeroMQ's design, but this is standard in message-oriented systems.
The architecture is production-ready and follows industry best practices for distributed systems.