Trace Analysis on Diffusion Model Serving
May 21, 2025 ยท View on GitHub
Dataset Overview
This dataset captures the invocation characteristics of approximately 550,000 text-to-image requests collected over a span of 20 days.
Key features of the dataset include:
- Base Models: The primary models used for text-to-image generation.
- ControlNet: A collection of ControlNet models applied to refine outputs.
- LoRA: A list of LoRA (Low-Rank Adaptation) models utilized for fine-tuning.
- Services: Data spans two distinct services (Service A and Service B).
Each request entry includes the following fields:
- timestamp: The time when the request was made.
- ip: The name of the serving instance.
- scene_id: The service identifier (Service A or Service B).
- base_model_id: The anonymized ID of the base model.
- controlnets: A list of anonymized ControlNet model IDs applied.
- loras: A list of anonymized LoRA model IDs utilized.
The dataset, named diffusion_model_request_trace.json, is available for download at: ModelScope
Download Instructions
To download the dataset, run the following command:
$ modelscope download --dataset 'mental2008/T2I-Model-Serving-Request-Trace' diffusion_model_request_trace.json --local_dir .
Ensure you have the modelscope CLI installed. If not, install it via:
$ pip install modelscope
Statistical Analysis
You can perform statistical analysis on requests originating from either Service A or Service B using the following commands:
# For Service A
$ python statistical_analysis.py --service A
# For Service B
$ python statistical_analysis.py --service B
Invocation Analysis
ControlNet
Analyze the invocation patterns of ControlNet models with:
$ python controlnet_invocation_analysis.py
Result location: figures/controlnet_invocation.pdf.
LoRA
Analyze the invocation patterns of LoRA models with:
$ python lora_invocation_analysis.py
Result location: figures/lora_invocation.pdf.
Adapter Loading Analysis
Investigate adapter loading behavior under different caching policies (LRU and LFU) for both services:
# LRU policy
$ python adapter_loading_analysis.py --service A --cache-policy LRU
$ python adapter_loading_analysis.py --service B --cache-policy LRU
# LFU policy
$ python adapter_loading_analysis.py --service A --cache-policy LFU
$ python adapter_loading_analysis.py --service B --cache-policy LFU
Result locations:
- LRU policy
- Service A:
figures/adapter_loading_LRU_A.pdf - Service B:
figures/adapter_loading_LRU_B.pdf
- Service A:
- LFU policy
- Service A:
figures/adapter_loading_LFU_A.pdf - Service B:
figures/adapter_loading_LFU_B.pdf
- Service A:
Unique LoRA Loading Analysis
Investigate the loading of unique LoRAs across different instances for both services.
# Service A
$ python unique_lora_loading_analysis.py --service A
# Service B
$ python unique_lora_loading_analysis.py --service B
Result locations:
- Service A:
figures/unique_loras_A.pdf - Service B:
figures/unique_loras_B.pdf