Arrival Patterns: Simulating Realistic Traffic
March 7, 2026 · View on GitHub
When benchmarking with --request-rate, AIPerf can vary how requests arrive over time. The --arrival-pattern option controls the distribution of inter-arrival times, letting you simulate everything from perfectly regular traffic to bursty real-world patterns.
Why Arrival Patterns Matter
Real traffic doesn't arrive at perfectly regular intervals. Traffic comes in bursts—quiet periods followed by sudden spikes. How your server handles this variance affects real-world performance.
Constant Pattern: Poisson Pattern: Gamma (bursty):
| | | | | | | | | || | | | ||| | ||| |
└──────────────────▶ └──────────────────▶ └──────────────────▶
Perfect spacing Natural variance Clustered bursts
(unrealistic) (typical traffic) (stress testing)
Quick Start
# Default: Poisson (realistic)
aiperf profile --request-rate 50 ...
# Explicit: Constant (deterministic)
aiperf profile --request-rate 50 --arrival-pattern constant ...
# Bursty: Gamma with low smoothness
aiperf profile --request-rate 50 --arrival-pattern gamma --arrival-smoothness 0.5 ...
Available Patterns
Constant
--arrival-pattern constant
Requests arrive at perfectly regular intervals: exactly 1/rate seconds apart.
Inter-arrival times:
10 QPS → every 100ms: |····|····|····|····|····|····|
0 100 200 300 400 500 600 ms
Use cases:
- Baseline measurements with no variance
- Debugging timing issues
- Comparing against variable patterns
- Deterministic, reproducible tests
Poisson (Default)
--arrival-pattern poisson
Requests arrive according to a Poisson process—the mathematical model for random events at a constant average rate. Inter-arrival times follow an exponential distribution.
Inter-arrival times (exponential):
10 QPS average: |··|······|·|···|····|··|·······|···|
Varied gaps, same average rate over time
Characteristics:
- Mean inter-arrival =
1/rate(same as constant) - Variance =
(1/rate)²(natural randomness) - Sometimes requests cluster, sometimes gaps appear
- Models real user behavior where arrivals are independent
Use cases:
- Default realistic traffic simulation
- Standard load testing
- Comparing to theoretical queueing models
Gamma (Tunable Burstiness)
--arrival-pattern gamma --arrival-smoothness <value>
Gamma distribution generalizes Poisson with a smoothness parameter that controls how bursty or regular arrivals are:
| Smoothness | Behavior | Variance | Use Case |
|---|---|---|---|
< 1.0 | Bursty — clustered arrivals with gaps | Higher | Stress testing, worst-case scenarios |
= 1.0 | Poisson — natural randomness | Medium | Same as --arrival-pattern poisson |
> 1.0 | Smooth — more regular arrivals | Lower | Controlled testing, less noise |
Smoothness = 0.5 (bursty):
|||| ||| ||||| ||
Clusters of requests with quiet gaps
Smoothness = 1.0 (Poisson):
| || | | | || | | || |
Natural variance
Smoothness = 2.0 (smooth):
| | | | | | | | | | | | | |
More regular, approaches constant
Mathematical note: The smoothness parameter is the Gamma distribution's shape parameter (k). Scale is automatically computed to maintain the correct mean rate.
Concurrency Burst
# No --request-rate, just --concurrency
aiperf profile --concurrency 50 ...
When you omit --request-rate and only specify --concurrency, AIPerf uses burst mode: zero delay between request dispatches, limited only by the concurrency semaphore.
Burst mode (concurrency=3):
[Req1]────────────────────────────▶
[Req2]────────────────────────────▶
[Req3]────────────────────────────▶
[Req4]──────────────────────▶ ← Starts when any slot frees
Use cases:
- Maximum throughput discovery
- Saturation testing
- Finding server capacity limits
vLLM Compatibility
AIPerf's --arrival-smoothness is compatible with vLLM's --burstiness parameter:
# Same distribution as vLLM with --burstiness 0.5
aiperf profile \
--request-rate 50 \
--arrival-pattern gamma \
--arrival-smoothness 0.5 \
...
This allows direct comparison between AIPerf and vLLM benchmark results when using the same smoothness/burstiness value.
Examples
Baseline vs Realistic Comparison
Compare how your server handles ideal vs realistic traffic:
# Run 1: Constant (baseline)
aiperf profile \
--model your-model \
--url localhost:8000 \
--endpoint-type chat \
--streaming \
--request-rate 100 \
--arrival-pattern constant \
--benchmark-duration 60 \
--output-dir results/constant
**Expected Output (Run 1):**
INFO Starting AIPerf System INFO Using Request_Rate strategy with constant arrival pattern INFO AIPerf System is PROFILING
Profiling: [01:00] - Running for 60 seconds...
INFO Benchmark completed successfully INFO Results saved to: results/constant/
NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓ ┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p50 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩ │ Request Latency (ms) │ 178.45 │ 156.23 │ 212.34 │ 205.67 │ 176.89 │ │ Time to First Token (ms) │ 45.67 │ 38.12 │ 58.34 │ 56.23 │ 44.90 │ │ Inter Token Latency (ms) │ 11.23 │ 9.45 │ 14.67 │ 14.12 │ 11.01 │ │ Request Throughput (req/s) │ 98.45 │ - │ - │ - │ - │ └────────────────────────────┴────────┴────────┴────────┴────────┴────────┘
JSON Export: results/constant/profile_export_aiperf.json
# Run 2: Poisson (realistic)
aiperf profile \
--model your-model \
--url localhost:8000 \
--endpoint-type chat \
--streaming \
--request-rate 100 \
--arrival-pattern poisson \
--benchmark-duration 60 \
--output-dir results/poisson
Expected Output (Run 2):
INFO Starting AIPerf System
INFO Using Request_Rate strategy with poisson arrival pattern
INFO AIPerf System is PROFILING
Profiling: [01:00] - Running for 60 seconds...
INFO Benchmark completed successfully
INFO Results saved to: results/poisson/
NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p50 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ Request Latency (ms) │ 182.34 │ 148.56 │ 267.89 │ 245.67 │ 179.12 │
│ Time to First Token (ms) │ 47.89 │ 35.67 │ 78.23 │ 72.45 │ 46.34 │
│ Inter Token Latency (ms) │ 11.67 │ 8.90 │ 19.34 │ 17.89 │ 11.23 │
│ Request Throughput (req/s) │ 96.78 │ - │ - │ - │ - │
└────────────────────────────┴────────┴────────┴────────┴────────┴────────┘
JSON Export: results/poisson/profile_export_aiperf.json
Compare TTFT and throughput between runs. Higher variance under Poisson indicates sensitivity to traffic patterns.
Stress Testing with Bursty Traffic
Test how your server handles request bursts:
aiperf profile \
--model your-model \
--url localhost:8000 \
--endpoint-type chat \
--streaming \
--request-rate 100 \
--arrival-pattern gamma \
--arrival-smoothness 0.3 \
--benchmark-duration 120
Sample Output (Successful Run):
INFO Starting AIPerf System
INFO Using Request_Rate strategy with gamma arrival pattern (smoothness: 0.3)
INFO AIPerf System is PROFILING
Profiling: [02:00] - Running for 120 seconds...
INFO Benchmark completed successfully
INFO Results saved to: artifacts/your-model-chat-rate100/
NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p50 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ Request Latency (ms) │ 198.67 │ 142.34 │ 398.12 │ 356.78 │ 189.45 │
│ Time to First Token (ms) │ 52.34 │ 34.56 │ 112.34 │ 98.67 │ 49.23 │
│ Inter Token Latency (ms) │ 12.89 │ 8.23 │ 28.45 │ 24.67 │ 12.01 │
│ Request Throughput (req/s) │ 93.45 │ - │ - │ - │ - │
└────────────────────────────┴────────┴────────┴────────┴────────┴────────┘
JSON Export: artifacts/your-model-chat-rate100/profile_export_aiperf.json
Smoothness of 0.3 creates highly bursty traffic—several requests arrive nearly simultaneously, then quiet periods.
Smooth Traffic for Noise Reduction
Reduce variance in measurements for controlled experiments:
aiperf profile \
--model your-model \
--url localhost:8000 \
--endpoint-type chat \
--streaming \
--request-rate 50 \
--arrival-pattern gamma \
--arrival-smoothness 5.0 \
--benchmark-duration 60
Sample Output (Successful Run):
INFO Starting AIPerf System
INFO Using Request_Rate strategy with gamma arrival pattern (smoothness: 5.0)
INFO AIPerf System is PROFILING
Profiling: [01:00] - Running for 60 seconds...
INFO Benchmark completed successfully
INFO Results saved to: artifacts/your-model-chat-rate50/
NVIDIA AIPerf | LLM Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p50 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━┩
│ Request Latency (ms) │ 165.23 │ 148.90 │ 189.45 │ 184.56 │ 164.12 │
│ Time to First Token (ms) │ 42.67 │ 36.89 │ 52.34 │ 50.12 │ 42.01 │
│ Inter Token Latency (ms) │ 10.89 │ 9.23 │ 13.45 │ 13.01 │ 10.67 │
│ Request Throughput (req/s) │ 49.23 │ - │ - │ - │ - │
└────────────────────────────┴────────┴────────┴────────┴────────┴────────┘
JSON Export: artifacts/your-model-chat-rate50/profile_export_aiperf.json
Smoothness of 5.0 produces very regular arrivals, reducing measurement noise while still having some natural variance.
Progressive Burstiness Test
Run multiple benchmarks with increasing burstiness to find where performance degrades:
for smoothness in 2.0 1.0 0.7 0.5 0.3; do
aiperf profile \
--model your-model \
--url localhost:8000 \
--endpoint-type chat \
--streaming \
--request-rate 100 \
--arrival-pattern gamma \
--arrival-smoothness $smoothness \
--benchmark-duration 60 \
--output-dir results/smoothness_$smoothness
done
Warmup with Stable Pattern, Profile with Realistic
Use constant arrivals during warmup, then realistic patterns for profiling:
aiperf profile \
--model your-model \
--url localhost:8000 \
--endpoint-type chat \
--streaming \
--request-rate 100 \
--arrival-pattern gamma \
--arrival-smoothness 0.8 \
--warmup-arrival-pattern constant \
--warmup-duration 30 \
--benchmark-duration 120
CLI Reference
| Option | Type | Default | Description |
|---|---|---|---|
--arrival-pattern | str | poisson | Pattern for request arrivals: constant, poisson, gamma |
--arrival-smoothness | float | None | Gamma smoothness: <1 = bursty, 1 = Poisson, >1 = smooth. Defaults to 1.0 when using gamma pattern. |
--warmup-arrival-pattern | str | Inherits | Override pattern for warmup phase |
Constraints:
--arrival-patternrequires--request-rateto be set--arrival-smoothnessonly applies when--arrival-pattern gamma- Cannot use with
--user-centric-rate(deterministic per-user scheduling) - Cannot use with
--fixed-schedule(timestamp-based scheduling)
Pattern Selection Guide
| Goal | Pattern | Smoothness |
|---|---|---|
| Reproducible baseline | constant | N/A |
| Realistic traffic simulation | poisson | N/A |
| Match vLLM benchmark | gamma | Same as vLLM --burstiness |
| Stress test burst handling | gamma | 0.3 - 0.7 |
| Reduce measurement noise | gamma | 2.0 - 5.0 |
| Maximum throughput | N/A (burst mode) | N/A |
Understanding the Math
For those who want to understand the statistical properties:
| Pattern | Distribution | Mean | Variance | CV (Coeff. of Variation) |
|---|---|---|---|---|
| Constant | Degenerate | 1/λ | 0 | 0 |
| Poisson | Exponential | 1/λ | 1/λ² | 1 |
| Gamma(k) | Gamma | 1/λ | 1/(k·λ²) | 1/√k |
Where λ = request rate and k = smoothness.
- CV (Coefficient of Variation) = standard deviation / mean
- Lower CV = more regular arrivals
- Gamma with k=1 equals Poisson (CV=1)
- As k→∞, Gamma approaches Constant (CV→0)
Related Documentation
- Request Rate with Concurrency — Combining rate and concurrency
- Warmup Phase — Configuring warmup with different patterns
- Timing Modes Reference — Complete CLI compatibility matrix