扩散模型3 Stable Diffusion - SunXiaoXiang/Diffusers GitHub Wiki

Stable Diffusion 模型来创造和修改图片

利用 StableDiffusionPipeline 根据文字描述生成图片，并通过修改各个输入参数来进行探究实验
实操了解管线的一些关键组成部分：
- 让这个模型成为“隐编码扩散模型（latent diffusion model）”的可变分自编码器（VAE）
- 处理文本提示的分词器（tokenizer）和文本编码器
- UNet 模型本身
- 使用的调度器（scheduler），以及其它不同的调度器
使用管线的组成部分来复现采样循环
用Img2Im管线来编辑现有图片
使用inpainting管线和Depth2Img管线

Load the pipeline

model_id = "stabilityai/stable-diffusion-2-1-base"
# model_id = "CompVis/stable-diffusion-v1-4" #旧的模型
pipe = StableDiffusionPipeline.from_pretrained(model_id).to(device)

如果你的GPU内存不够用，这里有些办法也许可以减少内存使用：

载入 FP16 精度的版本（但并不是所有的系统上都支持）。与此同时，在你对管线的某个特定部分实验时，你也需要把所有的张量换成 torch.float16 精度：

pipe = StableDiffusionPipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16).to(device)
开启注意力机制切分（attention slicing）。这会牺牲一点点速度来减少GPU内存的使用：

pipe.enable_attention_slicing()

# Set up a generator for reproducibility
generator = torch.Generator(device=device).manual_seed(42)

# Run the pipeline, showing some of the available arguments
pipe_output = pipe(
    prompt="Palette knife painting of an autumn cityscape", # What to generate
    negative_prompt="Oversaturated, blurry, low quality", # What NOT to generate
    height=480, width=640,     # Specify the image size
    guidance_scale=8,          # How strongly to follow the prompt
    num_inference_steps=35,    # How many steps to take
    generator=generator        # Fixed random seed
)

# View the resulting image:
pipe_output.images[0]

文字生成图片，修改prompt和negative_prompt

主要的要调节参数介绍：

width 和 height 指定了生成图片的尺寸。它们必须是可被 8 整除的数字，只有这样我们的可变分自编码器（VAE）才能正常工作（我们在将来的章节会了解到）。
步数 num_inference_steps 也会影响生成的质量。默认设成 50 已经很好了，但有些时候你也可以用少到像 20 步这样，这对做实验就方便多了。
使用 negative_prompt 来强调不希望生成的内容，一般会在无分类器引导（classifier-free guidance）的过程中用到，这可以是个非常有用的添加额外控制的方式。你可以留空这个地方不管，但很多用户觉得列出一些不想要的特性对更好的生成很有帮助。
guidance_scale 这个参数决定了无分类器引导（CFG）的影响强度有多大。增大这个值会使得生成的内容更接近文字提示；但这个值如果过大，可能会使得结果变得过饱和、不好看。

可变分自编码器（VAE）是一种模型，它可以将输入编码成一种被压缩过的表示形式，再把这个“隐式的”表示形式解码成某种接近输入的输出。当我们使用 Stable Diffusion 生成图片时，我们先在VAE的“隐空间”应用扩散过程生成隐编码，然后在结尾对它们解码来查看结果图片。

这里就是一个例子，使用VAE把输入图片编码成隐式的表示形式，再对它解码：

# Create some fake data (a random image, range (-1, 1))
images = torch.rand(1, 3, 512, 512).to(device) * 2 - 1 
print("Input images shape:", images.shape)

# Encode to latent space
with torch.no_grad():
  latents = 0.18215 * pipe.vae.encode(images).latent_dist.mean
print("Encoded latents shape:", latents.shape)

# Decode again
with torch.no_grad():
  decoded_images = pipe.vae.decode(latents / 0.18215).sample
print("Decoded images shape:", decoded_images.shape)

调度器（Scheduler）

调度器保存了如何加噪的计划安排，管理着如何基于模型的预测更新带噪样本。默认的调度器是 PNDMScheduler 调度器，但你也可以用其它的（比如 LMSDiscreteScheduler 调度器），只要它们用相同的配置初始化。

管理你的模型缓存

探索不同的管线和模型可能会占满你的硬盘空间。你可用这个指令看看你都下载了哪些模型到你的硬盘上：

In [ ]:

!ls ~/.cache/huggingface/diffusers/ # List the contents of the cache directory