PPDiffusers Pipelines

July 17, 2023 · View on GitHub

Pipelines提供了一种对各种SOTA扩散模型进行各种下游任务推理的简单方式。 大多数扩散模型系统由多个独立训练的模型和高度自适应的调度器(scheduler)组成,通过pipeline我们可以很方便的对这些扩散模型系统进行端到端的推理。

举例来说, Stable Diffusion由以下组件构成:

  • Autoencoder
  • Conditional Unet
  • CLIP text encoder
  • Scheduler
  • CLIPFeatureExtractor
  • Safety checker

这些组件之间是独立训练或创建的,同时在Stable Diffusion的推理运行中也是必需的,我们可以通过pipelines来对整个系统进行封装,从而提供一个简洁的推理接口。

我们通过pipelines在统一的API下提供所有开源且SOTA的扩散模型系统的推理能力。具体来说,我们的pipelines能够提供以下功能:

  1. 可以加载官方发布的权重,并根据相应的论文复现出与原始实现相同的输出
  2. 提供一个简单的用户界面来推理运行扩散模型系统,参见Pipelines API部分
  3. 提供易于理解的代码实现,可以与官方文档一起阅读,参见Pipelines汇总部分
  4. 支持多种模态下的10+种任务,参见任务展示部分
  5. 可以很容易地与社区建立联系

【注意】 Pipelines不(也不应该)提供任何训练功能。 如果您正在寻找训练的相关示例,请查看examples.

任务展示

文本图像多模

 文图生成(Text-to-Image Generation)

text_to_image_generation-stable_diffusion

from ppdiffusers import StableDiffusionPipeline

# 加载模型和scheduler
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

# 执行pipeline进行推理
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0]

# 保存图片
image.save("astronaut_rides_horse_sd.png")
image

text_to_image_generation-deepfloyd_if

import paddle

from ppdiffusers import DiffusionPipeline, IFPipeline, IFSuperResolutionPipeline
from ppdiffusers.utils import pd_to_pil

# Stage 1: generate images
pipe = IFPipeline.from_pretrained("DeepFloyd/IF-I-XL-v1.0", variant="fp16", paddle_dtype=paddle.float16)
pipe.enable_xformers_memory_efficient_attention()
prompt = 'a photo of a kangaroo wearing an orange hoodie and blue sunglasses standing in front of the eiffel tower holding a sign that says "very deep learning"'
prompt_embeds, negative_embeds = pipe.encode_prompt(prompt)
image = pipe(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    output_type="pd",
).images

# save intermediate image
pil_image = pd_to_pil(image)
pil_image[0].save("text_to_image_generation-deepfloyd_if-result-if_stage_I.png")
# save gpu memory
pipe.to(paddle_device="cpu")

# Stage 2: super resolution stage1
super_res_1_pipe = IFSuperResolutionPipeline.from_pretrained(
    "DeepFloyd/IF-II-L-v1.0", text_encoder=None, variant="fp16", paddle_dtype=paddle.float16
)
super_res_1_pipe.enable_xformers_memory_efficient_attention()

image = super_res_1_pipe(
    image=image,
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_embeds,
    output_type="pd",
).images
# save intermediate image
pil_image = pd_to_pil(image)
pil_image[0].save("text_to_image_generation-deepfloyd_if-result-if_stage_II.png")
# save gpu memory
super_res_1_pipe.to(paddle_device="cpu")

# Stage 3: super resolution stage2
super_res_2_pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-x4-upscaler", paddle_dtype=paddle.float16
)
super_res_2_pipe.enable_xformers_memory_efficient_attention()

image = super_res_2_pipe(
    prompt=prompt,
    image=image,
).images
image[0].save("text_to_image_generation-deepfloyd_if-result-if_stage_III.png")
image
if_stage_I
image
if_stage_II
image
if_stage_III
 文本引导的图像放大(Text-Guided Image Upscaling)

text_guided_image_upscaling-stable_diffusion_2

from ppdiffusers import StableDiffusionUpscalePipeline
from ppdiffusers.utils import load_image

pipe = StableDiffusionUpscalePipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler")

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/low_res_cat.png"
low_res_img = load_image(url).resize((128, 128))

prompt = "a white cat"
upscaled_image = pipe(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat_sd2.png")
image
原图像
image
生成图像
 文本引导的图像编辑(Text-Guided Image Inpainting)

text_guided_image_inpainting-stable_diffusion_2

from ppdiffusers import StableDiffusionUpscalePipeline
from ppdiffusers.utils import load_image

pipe = StableDiffusionUpscalePipeline.from_pretrained("stabilityai/stable-diffusion-x4-upscaler")

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/low_res_cat.png"
low_res_img = load_image(url).resize((128, 128))

prompt = "a white cat"
upscaled_image = pipe(prompt=prompt, image=low_res_img).images[0]
upscaled_image.save("upsampled_cat_sd2.png")
image
原图像
image
生成图像
 文本引导的图像变换(Image-to-Image Text-Guided Generation)

image_to_image_text_guided_generation-stable_diffusion

import paddle

from ppdiffusers import StableDiffusionImg2ImgPipeline
from ppdiffusers.utils import load_image

# 加载pipeline
pipe = StableDiffusionImg2ImgPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

# 下载初始图片
url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"

init_image = load_image(url).resize((768, 512))

prompt = "A fantasy landscape, trending on artstation"
# 使用fp16加快生成速度
with paddle.amp.auto_cast(True):
    image = pipe(prompt=prompt, image=init_image, strength=0.75, guidance_scale=7.5).images[0]

image.save("fantasy_landscape.png")
image
原图像
image
生成图像
 文本图像双引导图像生成(Dual Text and Image Guided Generation)

dual_text_and_image_guided_generation-versatile_diffusion

from ppdiffusers import VersatileDiffusionDualGuidedPipeline
from ppdiffusers.utils import load_image

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/benz.jpg"
image = load_image(url)
text = "a red car in the sun"

pipe = VersatileDiffusionDualGuidedPipeline.from_pretrained("shi-labs/versatile-diffusion")
pipe.remove_unused_weights()

text_to_image_strength = 0.75
image = pipe(prompt=text, image=image, text_to_image_strength=text_to_image_strength).images[0]
image.save("versatile-diffusion-red_car.png")
image
原图像
image
生成图像

文本视频多模

 文本条件的视频生成(Text-to-Video Generation)

text_to_video_generation-synth

import imageio

from ppdiffusers import DPMSolverMultistepScheduler, TextToVideoSDPipeline

pipe = TextToVideoSDPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

prompt = "An astronaut riding a horse."
video_frames = pipe(prompt, num_inference_steps=25).frames
imageio.mimsave("text_to_video_generation-synth-result-astronaut_riding_a_horse.mp4", video_frames, fps=8)
image

text_to_video_generation-zero

import imageio

# pip install imageio[ffmpeg]
import paddle

from ppdiffusers import TextToVideoZeroPipeline

model_id = "runwayml/stable-diffusion-v1-5"
pipe = TextToVideoZeroPipeline.from_pretrained(model_id, paddle_dtype=paddle.float16)

prompt = "A panda is playing guitar on times square"
result = pipe(prompt=prompt).images
result = [(r * 255).astype("uint8") for r in result]
imageio.mimsave("text_to_video_generation-zero-result-panda.mp4", result, fps=4)
image

文本音频多模

 文本条件的音频生成(Text-to-Audio Generation)

text_to_audio_generation-audio_ldm

import paddle
import scipy

from ppdiffusers import AudioLDMPipeline

pipe = AudioLDMPipeline.from_pretrained("cvssp/audioldm", paddle_dtype=paddle.float16)

prompt = "Techno music with a strong, upbeat tempo and high melodic riffs"
audio = pipe(prompt, num_inference_steps=10, audio_length_in_s=5.0).audios[0]

output_path = "text_to_audio_generation-audio_ldm-techno.wav"
# save the audio sample as a .wav file
scipy.io.wavfile.write(output_path, rate=16000, data=audio)

图像

 无条件图像生成(Unconditional Image Generation)

unconditional_image_generation-latent_diffusion_uncond

from ppdiffusers import LDMPipeline

# 加载模型和scheduler
pipe = LDMPipeline.from_pretrained("CompVis/ldm-celebahq-256")

# 执行pipeline进行推理
image = pipe(num_inference_steps=200).images[0]

# 保存图片
image.save("ldm_generated_image.png")
image
 超分(Super Superresolution)

super_resolution-latent_diffusion

import paddle

from ppdiffusers import LDMSuperResolutionPipeline
from ppdiffusers.utils import load_image

# 加载pipeline
pipe = LDMSuperResolutionPipeline.from_pretrained("CompVis/ldm-super-resolution-4x-openimages")

# 下载初始图片
url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/overture-creations.png"

init_image = load_image(url).resize((128, 128))
init_image.save("original-image.png")

# 使用fp16加快生成速度
with paddle.amp.auto_cast(True):
    image = pipe(init_image, num_inference_steps=100, eta=1).images[0]

image.save("super-resolution-image.png")
image
原图像
image
生成图像
 图像编辑(Image Inpainting)

image_inpainting-repaint

from ppdiffusers import RePaintPipeline, RePaintScheduler
from ppdiffusers.utils import load_image

img_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/celeba_hq_256.png"
mask_url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/mask_256.png"

# Load the original image and the mask as PIL images
original_image = load_image(img_url).resize((256, 256))
mask_image = load_image(mask_url).resize((256, 256))

scheduler = RePaintScheduler.from_pretrained("google/ddpm-ema-celebahq-256", subfolder="scheduler")
pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler)

output = pipe(
    original_image=original_image,
    mask_image=mask_image,
    num_inference_steps=250,
    eta=0.0,
    jump_length=10,
    jump_n_sample=10,
)
inpainted_image = output.images[0]

inpainted_image.save("repaint-image.png")
image
原图像
image
mask图像
image
生成图像
 图像变化(Image Variation)

image_variation-versatile_diffusion

from ppdiffusers import VersatileDiffusionImageVariationPipeline
from ppdiffusers.utils import load_image

url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/data/benz.jpg"
image = load_image(url)

pipe = VersatileDiffusionImageVariationPipeline.from_pretrained("shi-labs/versatile-diffusion")

image = pipe(image).images[0]
image.save("versatile-diffusion-car_variation.png")
image
原图像
image
生成图像

音频

 无条件音频生成(Unconditional Audio Generation)

unconditional_audio_generation-audio_diffusion

from scipy.io.wavfile import write
from ppdiffusers import AudioDiffusionPipeline
import paddle

# 加载模型和scheduler
pipe = AudioDiffusionPipeline.from_pretrained("teticio/audio-diffusion-ddim-256")
pipe.set_progress_bar_config(disable=None)
generator = paddle.Generator().manual_seed(42)

output = pipe(generator=generator)
audio = output.audios[0]
image = output.images[0]

# 保存音频到本地
for i, audio in enumerate(audio):
    write(f"audio_diffusion_test{i}.wav", pipe.mel.sample_rate, audio.transpose())

# 保存图片
image.save("audio_diffusion_test.png")
image

unconditional_audio_generation-spectrogram_diffusion

import paddle
import scipy

from ppdiffusers import MidiProcessor, SpectrogramDiffusionPipeline
from ppdiffusers.utils.download_utils import ppdiffusers_url_download

# Download MIDI from: wget https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/beethoven_hammerklavier_2.mid
mid_file_path = ppdiffusers_url_download(
    "https://paddlenlp.bj.bcebos.com/models/community/junnyu/develop/beethoven_hammerklavier_2.mid", cache_dir="."
)
pipe = SpectrogramDiffusionPipeline.from_pretrained("google/music-spectrogram-diffusion", paddle_dtype=paddle.float16)
processor = MidiProcessor()
output = pipe(processor(mid_file_path))
audio = output.audios[0]

output_path = "unconditional_audio_generation-spectrogram_diffusion-result-beethoven_hammerklavier_2.wav"
# save the audio sample as a .wav file
scipy.io.wavfile.write(output_path, rate=16000, data=audio)

Pipelines汇总

下表总结了所有支持的Pipelines,以及相应的来源、任务、推理脚本。

Pipeline源链接任务推理脚本
alt_diffusionAlt DiffusionText-to-Image Generationlink
alt_diffusionAlt DiffusionImage-to-Image Text-Guided Generationlink
audio_diffusionAudio DiffusionUnconditional Audio Generationlink
controlnetControlNet with Stable DiffusionImage-to-Image Text-Guided Generationlink
dance_diffusionDance DiffusionUnconditional Audio Generationlink
ddpmDenoising Diffusion Probabilistic ModelsUnconditional Image Generationlink
ddimDenoising Diffusion Implicit ModelsUnconditional Image Generationlink
latent_diffusionHigh-Resolution Image Synthesis with Latent Diffusion ModelsText-to-Image Generationlink
latent_diffusionHigh-Resolution Image Synthesis with Latent Diffusion ModelsSuper Superresolutionlink
latent_diffusion_uncondHigh-Resolution Image Synthesis with Latent Diffusion ModelsUnconditional Image Generationlink
paint_by_examplePaint by Example: Exemplar-based Image Editing with Diffusion ModelsImage-Guided Image Inpaintinglink
pndmPseudo Numerical Methods for Diffusion Models on ManifoldsUnconditional Image Generationlink
repaintRepaintImage Inpaintinglink
score_sde_veScore-Based Generative Modeling through Stochastic Differential EquationsUnconditional Image Generationlink
semantic_stable_diffusionSemantic GuidanceText-Guided Generationlink
stable_diffusionStable DiffusionText-to-Image Generationlink
stable_diffusionStable DiffusionImage-to-Image Text-Guided Generationlink
stable_diffusionStable DiffusionText-Guided Image Inpaintinglink
stable_diffusion_2Stable Diffusion 2Text-to-Image Generationlink
stable_diffusion_2Stable Diffusion 2Image-to-Image Text-Guided Generationlink
stable_diffusion_2Stable Diffusion 2Text-Guided Image Inpaintinglink
stable_diffusion_2Stable Diffusion 2Text-Guided Image Upscalinglink
stable_diffusion_2Stable Diffusion 2Text-Guided Image Upscalinglink
stable_diffusion_safeSafe Stable DiffusionText-to-Image Generationlink
stochastic_karras_veElucidating the Design Space of Diffusion-Based Generative ModelsUnconditional Image Generationlink
unclipUnCLIPText-to-Image Generationlink
versatile_diffusionVersatile DiffusionText-to-Image Generationlink
versatile_diffusionVersatile DiffusionImage Variationlink
versatile_diffusionVersatile DiffusionDual Text and Image Guided Generationlink
vq_diffusionVQ DiffusionText-to-Image Generationlink

【注意】 Pipelines可以端到端的展示相应论文中描述的扩散模型系统。然而,大多数Pipelines可以使用不同的调度器组件,甚至不同的模型组件。

Pipelines API

扩散模型系统通常由多个独立训练的模型以及调度器等其他组件构成。 其中每个模型都是在不同的任务上独立训练的,调度器可以很容易地进行替换。 然而,在推理过程中,我们希望能够轻松地加载所有组件并在推理中使用它们,即使某个组件来自不同的库, 为此,所有pipeline都提供以下功能:

  • from_pretrained 该方法接收PaddleNLP模型库id(例如runwayml/stable-diffusion-v1-5)或本地目录路径。为了能够准确加载相应的模型和组件,相应目录下必须提供model_index.json文件。

  • save_pretrained 该方法接受一个本地目录路径,Pipelines的所有模型或组件都将被保存到该目录下。对于每个模型或组件,都会在给定目录下创建一个子文件夹。同时model_index.json文件将会创建在本地目录路径的根目录下,以便可以再次从本地路径实例化整个Pipelines。

  • __call__ Pipelines在推理时将调用该方法。该方法定义了Pipelines的推理逻辑,它应该包括预处理、张量在不同模型之间的前向传播、后处理等整个推理流程。