Huginn-Net Benchmarks
May 17, 2026 · View on GitHub
Performance benchmarks for all Huginn-Net protocol libraries.
Numbers source: Criterion.rs medians across 5 runs (
./benches/run_bench.sh <proto> 5). Per-packet values are criterion median ÷ packets per iteration (TCP: 43,000 · HTTP: 16,000 · TLS: 1,000). The inlinemeasure_average_timesummary printed at the end of each run uses 3–10 iterations and no warmup; only Criterion output is authoritative.
About CPU percentages: All "1 Gbps CPU" / "10 Gbps CPU" values are projections derived from measured throughput (), not measurements under sustained packet load on a real NIC. They assume linear scaling and ignore kernel networking stack overhead, NIC IRQ handling, memory contention at line rate, and CPU clock stability. Treat as planning estimates: server-grade hardware (Xeon/EPYC, no thermal throttling, NIC flow steering, no background processes) typically delivers 30–80% better than this 8-core laptop; pathological workloads (cross-NUMA, deeply fragmented flows) may underperform.
Available Benchmarks
| Protocol | Benchmark File | Library | Command |
|---|---|---|---|
| TLS | bench_tls.rs | huginn-net-tls | cargo bench -p huginn-net-tls |
| TCP | bench_tcp.rs | huginn-net-tcp | cargo bench -p huginn-net-tcp |
| HTTP | bench_http.rs | huginn-net-http | cargo bench -p huginn-net-http |
Performance Summary
Sequential Mode (Single-Thread)
| Protocol | Time/Packet | Throughput | 1 Gbps CPU | 10 Gbps CPU |
|---|---|---|---|---|
| TCP | ~1.06 µs | ~942K pps | ~8.6% ✓ | ~86% ✓ |
| HTTP | ~1.39 µs | ~720K pps | ~11.3% ✓ | ~113% [OVERLOAD] |
| TLS | ~20 µs | ~50K pps | ~163% [OVERLOAD] | ~1625% [OVERLOAD] |
Parallel Mode
All three protocols share the same dispatch architecture: hash-based flow routing (4-tuple src/dst IP+port → worker) so the same TCP flow always lands on the same worker, enabling stateful per-worker processing via TtlCache.
TCP
| Workers | Throughput | Speedup | 1 Gbps CPU | 10 Gbps CPU |
|---|---|---|---|---|
| 1 (seq) | ~942K pps | 1.00x | ~8.6% ✓ | ~86% ✓ |
| 2 | ~2.09M pps | 2.22x | ~3.9% ✓ | ~39% ✓ |
| 4 | ~2.10M pps | 2.23x | ~3.9% ✓ | ~39% ✓ |
| 8 | ~2.06M pps | 2.19x | ~3.9% ✓ | ~39% ✓ |
HTTP
| Workers | Throughput | Speedup | 1 Gbps CPU | 10 Gbps CPU |
|---|---|---|---|---|
| 1 (seq) | ~720K pps | 1.00x | ~11.3% ✓ | ~113% [OVERLOAD] |
| 2 | ~1.55M pps | 2.15x | ~5.3% ✓ | ~52.6% ✓ |
| 4 | ~1.09M pps | 1.51x | ~7.5% ✓ | ~74.8% ✓ |
| 8 | ~982K pps | 1.36x | ~8.3% ✓ | ~82.7% ✓ |
TLS
| Workers | Throughput | Speedup | 1 Gbps CPU | 10 Gbps CPU |
|---|---|---|---|---|
| 1 (seq) | ~50K pps | 1.00x | ~163% [OVERLOAD] | ~1625% [OVERLOAD] |
| 2 | ~97K pps | 1.94x | ~84% ✓ | ~838% [OVERLOAD] |
| 4 | ~96K pps | 1.92x | ~84% ✓ | ~839% [OVERLOAD] |
| 8 | ~96K pps | 1.92x | ~85% ✓ | ~843% [OVERLOAD] |
Key Insights
Protocol Ranking by Throughput
- TCP — ~942K pps sequential, ~2.10M pps parallel. The only protocol where 10 Gbps projects comfortably on a single core (~86% CPU); parallel projects to ~39%.
- HTTP — ~720K pps sequential, ~1.55M pps parallel. Sequential projects to overload at 10 Gbps line rate (~113% CPU); production deployments project to need at least 2 workers (~53% @ 10 Gbps).
- TLS — ~50K pps sequential, ~97K pps parallel. Even 1 Gbps projects to require parallel mode (~84% with 2 workers). 10 Gbps not projected feasible on 8-core hardware regardless of worker count; JA4 calculation + TCP reassembly dominate cost.
Feature Overhead
| Protocol | Feature | Without | With | Overhead |
|---|---|---|---|---|
| TCP | OS matching | ~556 ns (1.80M pps) | ~1.07 µs (930K pps) | ~93% |
| TCP | MTU/link matching | ~561 ns (1.78M pps) | ~1.06 µs (942K pps) | ~89% |
| HTTP | Browser matching | ~1.10 µs (910K pps) | ~1.42 µs (700K pps) | +29% |
| HTTP | Server matching | ~1.10 µs (915K pps) | ~1.48 µs (680K pps) | +35% |
Matching always adds cost — both OS, MTU, browser, and server matching require database lookups on every successful detection. Disable matching for use cases that only need flow tracking / header capture without OS/UA fingerprinting.
Parallel Scaling Behavior
- TCP: 2/4/8 workers plateau at ~2.1M pps (within measurement noise). 2 workers already saturates the achievable throughput for this workload; more workers don't help on 8-core hardware.
- HTTP: 2 workers is optimal (~1.55M pps, +115% over sequential). 4 and 8 workers degrade (~1.09M and ~982K pps) because flow-based hashing concentrates per-connection traffic onto fewer workers as the pool grows.
- TLS: 2/4/8 workers plateau at ~97K pps. The bottleneck is per-packet TCP reassembly, not worker count — adding workers yields no gain.
Detailed Analysis Reports
- TCP Analysis — OS fingerprinting, MTU detection, uptime calculation
- HTTP Analysis — Browser/server detection, flow tracking
- TLS Analysis — JA4 fingerprinting, TCP reassembly architecture
Hardware: x86_64, 8-core laptop. Optimal worker counts and absolute throughput are hardware-specific.