Prompt Tips - DrakeRichards/stable-diffusion-webui GitHub Wiki
Params
TL;DR: Tweak steps, cfg scale and sampler as results will vary depending on combination of all three
- Encoder
Which text tokenizer to use, SD typically usesCLiP
, but others can be substituted (BERT
,GPTx
, etc) - Batch Size
How many images to generate in parallel, limited by your VRAM - Batch Count
How many batches to run sequentially
So total number of images generated is batch size x batch count - Seed
Initializer for noise generator
Use same seed to have repeatable results, otherwise use random (-1) - CFG Scale (Classifer-Free-Guidance)
How close should diffusers follow prompt, 0 means none and 30 means exact
Best results are between 7 (creative) to 13 (realistic)
Higher CFG scale also removes details due to lower noise impact - Width & Height
SD 1.x is trained on 512x512 and SD 2.x is trained on 768x768
So typically don't change those and instead use upscalers if high resolution is needed
However, changing aspect ratio can change composition of image (e.g. portrait vs landscape results in close-up vs more wide angle results) - Steps
Directly impacts performance
How many iterative denoising steps to run, low number can lead to non-converged results (denoising is not complete)
Sweet-spot depends on chosen sampler, can be as low as 10 and as low as 100
Higher number of steps increases definition/precision/complexity of most important objects, but can completely remove secondary objects
At extreme steps values, all samplers converge since all noise is eventually removed - Samplers
Which algorithm or lightweight ML model to use to add noise in each step before diffusion
Different samplers are better at specific steps ranges and styles.
Different implementations of SD can prepackage different samplers- Using different samplers may require different number of steps before noise is removed
- Eventually all samplers converge at high number of steps
- Some samplers may fit different styles better
Prompt Engineering
Main groups
- Mediums: best starting a prompt with it after specifying artist
Examples: painting, photograph, drawing, sketch - Flavors: best left as separate token at the end of the prompt
Examples: ray tracing, fine art, black and white, pixiv, artstation - Movements: best added to prompt with as keyword
Examples: pop art, photorealism - Artists: best starting a prompt with it
Examples: greg rutkowski, artgerm, dc comics, picasso
Modifiers
- Feel: best near the end
Examples: beautiful, sharp focus, 4k, hdr, high detailed, canon 5d - Composition: best at front, but only use if results don't fit
Examples: 1men, 1woman
Negative Prompt
- Any keyword can be specified in a negative prompt as well Examples: watermark
Advanced Prompt Modifiers
- Availability depends on implementation
- Specify importance of specific words: E.g. using "(word)" means higher value while using "[word]" means lower value
- Alternate between words: "[word1|word2]" if batch is 2, it will generate one image using word1 and one image using word2
- Force include multiple objects "AND"
Hints
- Use either artists or movements
Do not use both as it will confuse model - Select medium that fits artist
It helps model a lot to know which medium to use when styling - Add action after subject
Examples: portrait, standing, sitting - Moving things to the front of prompt may force styling, but limits choices
Example: cartoon drawing of a woman as pixar vs pixar drawing of a woman - Use both subject and scene keywords: Example: woman on a beach
Example
(composition) (artist) (medium) (subject) (action) (scene) (movement) (flavor) (feel)
1woman greg rutkowski painting of a woman happy front portrait on a beach as photorealism, sharp focus, artstation