Tips & Tricks ‐ Stylization - InfluxOW/Stable-Diffusion-Text-To-Person GitHub Wiki
We can use Stable Diffusion
to stylize an existing photos, and its actually pretty easy to do. Lets use this amazing photo of James Gandolfini
as a reference.
Once again, we'll require ControlNet. Install the extension and download two models: Tile and Canny. Place them in stable-diffusion-webui\extensions\sd-webui-controlnet\models
.
It's also better to have any Negative Embeddings
. If you don't have any, download easynegative
. Place it in stable-diffusion-webui\embeddings
.
Now, let's move on to the img2img
tab. Drag'n'drop your image into the window. In the Prompt
field, describe your image. If you're feeling lazy, you can click the Interrogate CLIP
button to let the CLIP neural network describe the image for you, and then you can adjust the description as needed. In the Negative Prompt
field, paste the name of the previously downloaded Negative Embedding
, which is easynegative
, or use anything else on your choice. For Sampling method
and Sampling steps
choose DPM++ 2M Karras
and 25
, or, once again, use anything on your choice. Next, click the protractor icon to set the resolution of the selected image as the resolution for generation. If your image is larger than 2000 pixels on one side or your GPU doesn't have enough VRAM, you may need to proportionally reduce the resolution.
One of the main parameters you'll need to adjust is Denoising Strength
. It determines how much the reference image will be altered during generation. The best way to understand how it all works together is through grids. These grids illustrate the impact of Denoising Strength
with different styles specified in the prompt.
(anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5)
(renaissance painting, oil colors, hand drawn:1.5)
As you can see, the higher the Denoising Strength
, the less our reference image affects the result. You might already find some results acceptable at this stage. However, it's noticeable that with strong stylization, the image deviates significantly from the reference, and we'll fix this in the following steps.
The final image is generated based on random noise, with the seed
number serving as the foundation. You can make it constant by clicking the recycling arrows button (you need to generate something first - the seed
is taken from the last generated image). This way, you can track how different settings affect the result.
Go to the ControlNet
tab, drag your image into ControlNet Unit 0
. Enable the checkboxes for Enable
, Pixel Perfect
, Allow Preview
, and Preview as Input
. In the Control Type
, choose Canny
, and the Preprocessor
and Model
fields should automatically fill in. Click on the icon next to Preprocessor
, and you will see an image with contours of the reference image. The level of detail in this contour is controlled by the Canny Low Threshold
and Canny High Threshold
sliders. If the contour is too detailed, artifacts may appear in the final image. If there is not enough detail, the similarity to the reference image will suffer. You need to find the right balance. Another important parameter is Control Weight
. It determines how much the ControlNet Unit
will influence the result. We will explore its influence in more detail later on.
In the meantime, go to the ControlNet Unit 1
tab and drag your image there. Enable the checkboxes for Enable
and Pixel Perfect
. Choose Control Type
- Tile
, and Preprocessor
- tile_colorfix+sharp
.
As we've set all the necessary parameters, we can finally understand what's happening, using grids of course. The X-axis represents the Control Weight
of the Tile
model, while the Y-axis represents the Control Weight
of the Canny
model.
Denoising Strength - 1.0
Denoising Strength - 0.5
It's easier to understand what's happening on these grids by examining them yourself than explaining. But it's clear that there's a pattern: the higher the X, the closer the image resembles the reference image; the higher the Y, the more similarity in details. On different reference images and with different Denoising Strengths
, the best combinations of Control Weight
can vary, but with a high probability, they will be values somewhere in the middle of the grid (0.25-0.75)
.
It's also noticeable that at high Denoising Strength
values on the Y-axis, artifacts start to appear in the images. This could be due to setting an overdetailed contour or the prompt for stylization overpowering the result. To amplify a specific part of the prompt, it's wrapped in parentheses with a strength coefficient, like (anime style:1.5)
. Let's see how this affects our result on a familiar grid :)
X - Prompt Weight
, Y - Tile Control Weight
As you can see, a stronger prompt has a more significant impact on the result, but in certain situations, it can overpower the image, creating artifacts. So, this is also something to consider.
The four key settings that have the most significant impact on the stylization result, aside from the checkpoint, are:
-
Denoising Strength
, -
Tile Control Weight
, -
Canny Control Weight
, -
Prompt Strength
.
The basic process of stylizing an image includes the following steps:
-
Start by describing the reference image and configuring the essential parameters, but do not enable
ControlNet
at this point. Describe the style. Generate images withDenoising Strength
from0.5
to1.0
. This helps you assess the results, determine if you've chosen a suitable checkpoint, and verify if you've accurately described the desired style. -
Once you've generated an image with a satisfying style, make the
seed
constant and enableControlNet
. Adjust theControl Weight
within the range from0.25
to0.75
and observe how well the style aligns with the reference, how close the similarity is, and if any artifacts appear. If something isn't satisfactory, you can adjust one of the primary parameters accordingly. -
Once you've found the optimal parameters, set the
seed
to-1
and generate the desired number of images. -
Once you want to change the style, keep in mind that depending on the style, you may need to increase or decrease the
Control Weight
. If you want an image in the style of, for example,H.R. Giger
, a highControl Weight
won't give you a stylistically good result, but with a lowControl Weight
, you'll lose similarity. Again, you need to find the right balance.
Indeed, once you go through the entire process with a few images, it becomes much easier to intuitively adjust parameter values. Practice and experimentation are key to mastering the art of image stylization and control.
You can also choose a random artist using Dynamic Prompts extension and the list of artists. You can also browse websites like this one and similar ones where styles of different artists are visually demonstrated.
Style - (anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5), Denoising Strength - 1.0, Canny Control Weight - 0.4, Tile Control Weight - 0.75
Style - (renaissance painting, oil colors, hand drawn:1.5), Denoising strength - 1.0, Canny Control Weight - 0.4, Tile Control Weight - 0.75
Style - (anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5), Denoising Strength - 1.0, Canny Control Weight - 0.4, Tile Control Weight - 0.75
Style - (anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5), Denoising Strength - 0.75, Canny Control Weight - 0.5, Tile Control Weight - 0.75
Style - (anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5), Denoising Strength - 1.0, Canny Control Weight - 0.4, Tile Control Weight - 0.75
Style - (renaissance painting, oil colors, hand drawn:1.5), Denoising strength - 1.0, Canny Control Weight - 1.0, Tile Control Weight - 0.5
Next - Tips & Tricks ‐ Hidden Stamp