Trace Analysis on Diffusion Model Serving

May 21, 2025 ยท View on GitHub

Dataset Overview

This dataset captures the invocation characteristics of approximately 550,000 text-to-image requests collected over a span of 20 days.

Key features of the dataset include:

  • Base Models: The primary models used for text-to-image generation.
  • ControlNet: A collection of ControlNet models applied to refine outputs.
  • LoRA: A list of LoRA (Low-Rank Adaptation) models utilized for fine-tuning.
  • Services: Data spans two distinct services (Service A and Service B).

Each request entry includes the following fields:

  • timestamp: The time when the request was made.
  • ip: The name of the serving instance.
  • scene_id: The service identifier (Service A or Service B).
  • base_model_id: The anonymized ID of the base model.
  • controlnets: A list of anonymized ControlNet model IDs applied.
  • loras: A list of anonymized LoRA model IDs utilized.

The dataset, named diffusion_model_request_trace.json, is available for download at: ModelScope

Download Instructions

To download the dataset, run the following command:

$ modelscope download --dataset 'mental2008/T2I-Model-Serving-Request-Trace' diffusion_model_request_trace.json --local_dir .

Ensure you have the modelscope CLI installed. If not, install it via:

$ pip install modelscope

Statistical Analysis

You can perform statistical analysis on requests originating from either Service A or Service B using the following commands:

# For Service A
$ python statistical_analysis.py --service A

# For Service B
$ python statistical_analysis.py --service B

Invocation Analysis

ControlNet

Analyze the invocation patterns of ControlNet models with:

$ python controlnet_invocation_analysis.py

Result location: figures/controlnet_invocation.pdf.

LoRA

Analyze the invocation patterns of LoRA models with:

$ python lora_invocation_analysis.py

Result location: figures/lora_invocation.pdf.

Adapter Loading Analysis

Investigate adapter loading behavior under different caching policies (LRU and LFU) for both services:

# LRU policy
$ python adapter_loading_analysis.py --service A --cache-policy LRU
$ python adapter_loading_analysis.py --service B --cache-policy LRU

# LFU policy
$ python adapter_loading_analysis.py --service A --cache-policy LFU
$ python adapter_loading_analysis.py --service B --cache-policy LFU

Result locations:

Unique LoRA Loading Analysis

Investigate the loading of unique LoRAs across different instances for both services.

# Service A
$ python unique_lora_loading_analysis.py --service A

# Service B
$ python unique_lora_loading_analysis.py --service B

Result locations: