Profile Image Generation Models with AIPerf
June 5, 2026 · View on GitHub
Overview
This guide shows how to benchmark image generation APIs using a Docker-based server and AIPerf. You'll learn how to:
- Set up the server
- Create an input file and run the benchmark
- View the results and extract the generated images
References
For the most up-to-date information, please refer to the following resources:
Setting up the server
Login to Hugging Face, and accept the terms of use for the following model: FLUX.1-dev.
Export your Hugging Face token as an environment variable:
export HF_TOKEN=<your-huggingface-token>
Start the Docker container:
docker run --gpus all \
--shm-size 32g \
-it \
--rm \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=$HF_TOKEN" \
--ipc=host \
lmsysorg/sglang:dev
Note
The following steps are to be performed inside the Docker container. Install the dependencies:
pip install yunchang remote_pdb imageio diffusers diffusion
Set the server arguments:
Warning
The following arguments will setup the server to use the FLUX.1-dev model on a single GPU, on port 30000. You can modify these arguments to use a different model, different number of GPUs, different port, etc. See the SGLang Diffusion CLI Reference for more details.
SERVER_ARGS=( --model-path black-forest-labs/FLUX.1-dev --text-encoder-cpu-offload --pin-cpu-memory --num-gpus 1 --port 30000 --host 0.0.0.0 )
Start the server:
sglang serve "${SERVER_ARGS[@]}"
Wait until the server is ready (watch the logs for the following message):
Uvicorn running on http://0.0.0.0:30000 (Press CTRL+C to quit)
Running the benchmark (basic usage)
Note
The following steps are to be performed on your local machine. (outside the Docker container.)
Text-to-Image Generation Using Input File
Create an input file:
cat > image_prompts.jsonl << 'EOF'
{"text": "A serene mountain landscape at sunset"}
{"text": "A futuristic city with flying cars"}
{"text": "A cute robot playing with a kitten"}
EOF
Run the benchmark:
aiperf profile \
--model black-forest-labs/FLUX.1-dev \
--tokenizer gpt2 \
--url http://localhost:30000 \
--endpoint-type image_generation \
--input-file image_prompts.jsonl \
--custom-dataset-type single_turn \
--extra-inputs size:512x512 \
--extra-inputs quality:standard \
--concurrency 1 \
--request-count 3
Done! This sends 3 requests to http://localhost:30000/v1/images/generations
Sample Output (Successful Run):
NVIDIA AIPerf | Image Generation Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p50 ┃ std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━┩
│ Request Latency (ms) │ 12,617.58 │ 12,251.41 │ 12,954.04 │ 12,947.91 │ 12,892.69 │ 12,647.29 │ 287.62 │
│ Input Sequence Length (tokens) │ 6.67 │ 6.00 │ 7.00 │ 7.00 │ 7.00 │ 7.00 │ 0.47 │
│ Request Throughput (requests/sec) │ 0.08 │ - │ - │ - │ - │ - │ - │
│ Request Count (requests) │ 3.00 │ - │ - │ - │ - │ - │ - │
└───────────────────────────────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴────────┘
Text-to-Image Generation Using Synthetic Inputs
aiperf profile \
--model black-forest-labs/FLUX.1-dev \
--tokenizer gpt2 \
--url http://localhost:30000 \
--endpoint-type image_generation \
--extra-inputs size:512x512 \
--extra-inputs quality:standard \
--synthetic-input-tokens-mean 150 \
--synthetic-input-tokens-stddev 30 \
--concurrency 1 \
--request-count 3
Done! This sends 3 requests to http://localhost:30000/v1/images/generations
Sample Output (Successful Run):
NVIDIA AIPerf | Image Generation Metrics
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━┓
┃ Metric ┃ avg ┃ min ┃ max ┃ p99 ┃ p90 ┃ p50 ┃ std ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━┩
│ Request Latency (ms) │ 12,173.18 │ 11,918.37 │ 12,503.38 │ 12,495.27 │ 12,422.26 │ 12,097.79 │ 244.71 │
│ Input Sequence Length (tokens) │ 137.00 │ 107.00 │ 153.00 │ 152.96 │ 152.60 │ 151.00 │ 21.23 │
│ Request Throughput (requests/sec) │ 0.08 │ - │ - │ - │ - │ - │ - │
│ Request Count (requests) │ 3.00 │ - │ - │ - │ - │ - │ - │
└───────────────────────────────────┴───────────┴───────────┴───────────┴───────────┴───────────┴───────────┴────────┘
Understanding the Metrics
Image generation endpoints report a focused set of metrics. Unlike LLM text endpoints, there are no token-level streaming metrics (TTFT, ITL) since the image is returned as a single response.
| Metric | Description |
|---|---|
| Request Latency (ms) | End-to-end image generation time — from sending the request to receiving the complete image. This is the primary measure of image generation speed. |
| Input Sequence Length (tokens) | Token count of the text prompt used to generate the image. |
| Request Throughput (requests/sec) | Number of images generated per second across all concurrent workers. |
| Request Count (requests) | Total number of completed image generation requests. |
Tip
To increase throughput, raise --concurrency. Each concurrent worker sends requests independently, allowing multiple images to be generated in parallel.
Running the benchmark (advanced usage)
Create an input file:
cat > image_prompts.jsonl << 'EOF'
{"text": "A serene mountain landscape at sunset"}
{"text": "A futuristic city with flying cars"}
{"text": "A cute robot playing with a kitten"}
EOF
Run the benchmark:
Warning
Use --export-level raw to get the raw input/output payloads.
aiperf profile \
--model black-forest-labs/FLUX.1-dev \
--tokenizer gpt2 \
--url http://localhost:30000 \
--endpoint-type image_generation \
--input-file image_prompts.jsonl \
--custom-dataset-type single_turn \
--extra-inputs size:512x512 \
--extra-inputs quality:standard \
--concurrency 1 \
--request-count 3 \
--export-level raw
Viewing the generated images
Extract the generated images:
Copy the following code into a file called extract_images.py:
#!/usr/bin/env python3
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
"""Extract base64-encoded images from AIPerf JSONL output file."""
import base64
import json
import os
from pathlib import Path
import sys
# Read input file path
input_file = Path(sys.argv[1]) if len(sys.argv) > 1 else Path('artifacts/black-forest-labs_FLUX.1-dev-openai-image_generation-concurrency1/profile_export_raw.jsonl')
output_dir = Path(sys.argv[2]) if len(sys.argv) > 2 else Path('extracted_images')
# Create output directory
os.makedirs(output_dir, exist_ok=True)
# Process each line in the JSONL file
with open(input_file, 'r') as f:
for line_num, line in enumerate(f, 1):
record = json.loads(line)
# Extract images from responses
for response in record.get('responses', []):
response_data = json.loads(response.get('text', '{}'))
for data_idx, item in enumerate(response_data.get('data', [])):
if b64_image := item.get('b64_json'):
# Decode and save image
image_data = base64.b64decode(b64_image)
filename = output_dir / f"image_{line_num:04d}_{data_idx:02d}.jpg"
with open(filename, 'wb') as img_file:
img_file.write(image_data)
print(f"Extracted: {filename.resolve()}")
Run the script:
Tip
The script is setup to use the default directory and file names for the input and output files, but can be modified to use different files.
Usage: python extract_images.py <input_file> <output_dir>
python extract_images.py
Output:
Extracted: /path/to/extracted_images/image_0001_00.jpg
Extracted: /path/to/extracted_images/image_0001_01.jpg
Extracted: /path/to/extracted_images/image_0001_02.jpg
View the generated images:
Prompt:
{"text": "A serene mountain landscape at sunset"}

Prompt:
{"text": "A futuristic city with flying cars"}

Prompt:
{"text": "A cute robot playing with a kitten"}

Conclusion
You've successfully set up an image generation server, run your first benchmarks, and learned how to extract and view the generated images. You can now experiment with different models, prompts, and concurrency settings to optimize your image generation workloads.
Now go forth and generate!