Prompt Tips - DrakeRichards/stable-diffusion-webui GitHub Wiki

Params

TL;DR: Tweak steps, cfg scale and sampler as results will vary depending on combination of all three

  • Encoder
    Which text tokenizer to use, SD typically uses CLiP, but others can be substituted (BERT, GPTx, etc)
  • Batch Size
    How many images to generate in parallel, limited by your VRAM
  • Batch Count
    How many batches to run sequentially
    So total number of images generated is batch size x batch count
  • Seed
    Initializer for noise generator
    Use same seed to have repeatable results, otherwise use random (-1)
  • CFG Scale (Classifer-Free-Guidance)
    How close should diffusers follow prompt, 0 means none and 30 means exact
    Best results are between 7 (creative) to 13 (realistic)
    Higher CFG scale also removes details due to lower noise impact
  • Width & Height
    SD 1.x is trained on 512x512 and SD 2.x is trained on 768x768
    So typically don't change those and instead use upscalers if high resolution is needed
    However, changing aspect ratio can change composition of image (e.g. portrait vs landscape results in close-up vs more wide angle results)
  • Steps
    Directly impacts performance
    How many iterative denoising steps to run, low number can lead to non-converged results (denoising is not complete)
    Sweet-spot depends on chosen sampler, can be as low as 10 and as low as 100
    Higher number of steps increases definition/precision/complexity of most important objects, but can completely remove secondary objects
    At extreme steps values, all samplers converge since all noise is eventually removed
  • Samplers
    Which algorithm or lightweight ML model to use to add noise in each step before diffusion
    Different samplers are better at specific steps ranges and styles.
    Different implementations of SD can prepackage different samplers
    • Using different samplers may require different number of steps before noise is removed
    • Eventually all samplers converge at high number of steps
    • Some samplers may fit different styles better

Prompt Engineering

Main groups

  • Mediums: best starting a prompt with it after specifying artist
    Examples: painting, photograph, drawing, sketch
  • Flavors: best left as separate token at the end of the prompt
    Examples: ray tracing, fine art, black and white, pixiv, artstation
  • Movements: best added to prompt with as keyword
    Examples: pop art, photorealism
  • Artists: best starting a prompt with it
    Examples: greg rutkowski, artgerm, dc comics, picasso

Modifiers

  • Feel: best near the end
    Examples: beautiful, sharp focus, 4k, hdr, high detailed, canon 5d
  • Composition: best at front, but only use if results don't fit
    Examples: 1men, 1woman

Negative Prompt

  • Any keyword can be specified in a negative prompt as well Examples: watermark

Advanced Prompt Modifiers

  • Availability depends on implementation
  • Specify importance of specific words: E.g. using "(word)" means higher value while using "[word]" means lower value
  • Alternate between words: "[word1|word2]" if batch is 2, it will generate one image using word1 and one image using word2
  • Force include multiple objects "AND"

Hints

  • Use either artists or movements
    Do not use both as it will confuse model
  • Select medium that fits artist
    It helps model a lot to know which medium to use when styling
  • Add action after subject
    Examples: portrait, standing, sitting
  • Moving things to the front of prompt may force styling, but limits choices
    Example: cartoon drawing of a woman as pixar vs pixar drawing of a woman
  • Use both subject and scene keywords: Example: woman on a beach

Example

(composition) (artist) (medium) (subject) (action) (scene) (movement) (flavor) (feel)
1woman greg rutkowski painting of a woman happy front portrait on a beach as photorealism, sharp focus, artstation