README.md

July 3, 2023 ยท View on GitHub

Stochastic.ai Stochastic.ai


Welcome to x-stable-diffusion by Stochastic!

This project is a compilation of acceleration techniques for the Stable Diffusion model to help you generate images faster and more efficiently, saving you both time and money.

With example images and a comprehensive benchmark, you can easily choose the best technique for your needs. When you're ready to deploy, our CLI called stochasticx makes it easy to get started on your local machine. Try x-stable-diffusion and see the difference it can make for your image generation performance and cost savings.

๐Ÿš€ Installation

Quickstart

Make sure you have Python and Docker installed on your system

  1. Install the latest version of stochasticx library.
pip install stochasticx
  1. Deploy the Stable Diffusion model
stochasticx stable-diffusion deploy --type aitemplate

Alternatively, you can deploy stable diffusion without our CLI by checking the steps here.

  1. To perform inference with this deployed model:
stochasticx stable-diffusion inference --prompt "Riding a horse"

Check all the options of the inference command:

stochasticx stable-diffusion inference --help
  1. You can get the logs of the deployment executing the following command:
stochasticx stable-diffusion logs
  1. Stop and remove the deployment with this command:
stochasticx stable-diffusion stop

How to get less than 1s latency?

Change the num_inference_steps to 30. With this, you can get an image generated in 0.88 seconds.

{
  'max_seq_length': 64,
  'num_inference_steps': 30, 
  'image_size': (512, 512) 
}

You can also experiment with reducing the image_size.

How to run on Google Colab?

In each folder, we will provide a Google Colab notebook with which you can test the full flow and inference on a T4 GPU

Manual deployment

Check the README.md of the following directories:

๐Ÿ”ฅ Optimizations

Benchmarks

Setup

For hardware, we used 1x40GB A100 GPU with CUDA 11.6 and the results are reported by averaging 50 runs.

The following arguments were used for image generation for all the benchmarks:

{
  'max_seq_length': 64,
  'num_inference_steps': 50, 
  'image_size': (512, 512) 
}

Online results

For batch_size 1, these are the latency results:

A100 GPU

A100_GPU_graph

projectLatency (s)GPU VRAM (GB)
PyTorch fp165.7710.3
nvFuser fp163.15---
FlashAttention fp162.807.5
TensorRT fp161.688.1
AITemplate fp161.384.83
ONNX (CUDA)7.2613.3

T4 GPU

Note: AITemplate might not support T4 GPU yet. Check support here

T4_GPU_graph

projectLatency (s)
PyTorch fp1616.2
nvFuser fp1619.3
FlashAttention fp1613.7
TensorRT fp169.3

Batched results - A100 GPU

The following results were obtained by varying batch_size from 1 to 24.

A100_GPU_batch_size

project \ bs1481624
Pytorch fp165.77s/10.3GB19.2s/18.5GB36s/26.7GBOOM
FlashAttention fp162.80s/7.5GB9.1s/17GB17.7s/29.5GBOOM
TensorRT fp161.68s/8.1GBOOM
AITemplate fp161.38s/4.83GB4.25s/8.5GB7.4s/14.5GB15.7s/25GB23.4s/36GB
ONNX (CUDA)7.26s/13.3GBOOMOOMOOMOOM

Note: TensorRT fails to convert UNet model from ONNX to TensorRT due to memory issues.

Sample images generated

Click here to view the complete list of generated images

Optimization \ PromptSuper Mario learning to fly in an airport, Painting by Leonardo Da VinciThe Easter bunny riding a motorcycle in New York CityDrone flythrough of a tropical jungle convered in snow
PyTorch fp16pytorch_stable-diffusion_mariopytorch_stable-diffusion_bunnypytorch_stable-diffusion_bunny
nvFuser fp16nvFuser_stable-diffusion_marionvFuser_stable-diffusion_bunnynvFuser_stable-diffusion_bunny
FlashAttention fp16FlashAttention_stable-diffusion_marioFlashAttention_stable-diffusion_bunnyFlashAttention_stable-diffusion_bunny
TensorRT fp16TensorRT_stable-diffusion_marioTensorRT_stable-diffusion_bunnyTensorRT_stable-diffusion_bunny
AITemplate fp16AITemplate_stable-diffusion_marioAITemplate_stable-diffusion_bunnyAITemplate_stable-diffusion_bunny

References

๐ŸŒŽ Join our community

๐ŸŒŽ Contributing

As an open source project in a rapidly evolving field, we welcome contributions of all kinds, including new features and better documentation. Please read our contributing guide to learn how you can get involved.


For managed hosting on our cloud or on your private cloud [Contact us โ†’](https://stochastic.ai/contact)