Tips & Tricks ‐ Stylization - InfluxOW/Stable-Diffusion-Text-To-Person GitHub Wiki

We can use Stable Diffusion to stylize an existing photos, and its actually pretty easy to do. Lets use this amazing photo of James Gandolfini as a reference.

Instruments

Once again, we'll require ControlNet. Install the extension and download two models: Tile and Canny. Place them in stable-diffusion-webui\extensions\sd-webui-controlnet\models.

It's also better to have any Negative Embeddings. If you don't have any, download easynegative. Place it in stable-diffusion-webui\embeddings.

Basic Generation

Now, let's move on to the img2img tab. Drag'n'drop your image into the window. In the Prompt field, describe your image. If you're feeling lazy, you can click the Interrogate CLIP button to let the CLIP neural network describe the image for you, and then you can adjust the description as needed. In the Negative Prompt field, paste the name of the previously downloaded Negative Embedding, which is easynegative, or use anything else on your choice. For Sampling method and Sampling steps choose DPM++ 2M Karras and 25, or, once again, use anything on your choice. Next, click the protractor icon to set the resolution of the selected image as the resolution for generation. If your image is larger than 2000 pixels on one side or your GPU doesn't have enough VRAM, you may need to proportionally reduce the resolution.

One of the main parameters you'll need to adjust is Denoising Strength. It determines how much the reference image will be altered during generation. The best way to understand how it all works together is through grids. These grids illustrate the impact of Denoising Strength with different styles specified in the prompt.

_{(anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5)}

_{(renaissance painting, oil colors, hand drawn:1.5)}

As you can see, the higher the Denoising Strength, the less our reference image affects the result. You might already find some results acceptable at this stage. However, it's noticeable that with strong stylization, the image deviates significantly from the reference, and we'll fix this in the following steps.

Advanced Generation

The final image is generated based on random noise, with the seed number serving as the foundation. You can make it constant by clicking the recycling arrows button (you need to generate something first - the seed is taken from the last generated image). This way, you can track how different settings affect the result.

Go to the ControlNet tab, drag your image into ControlNet Unit 0. Enable the checkboxes for Enable, Pixel Perfect, Allow Preview, and Preview as Input. In the Control Type, choose Canny, and the Preprocessor and Model fields should automatically fill in. Click on the icon next to Preprocessor, and you will see an image with contours of the reference image. The level of detail in this contour is controlled by the Canny Low Threshold and Canny High Threshold sliders. If the contour is too detailed, artifacts may appear in the final image. If there is not enough detail, the similarity to the reference image will suffer. You need to find the right balance. Another important parameter is Control Weight. It determines how much the ControlNet Unit will influence the result. We will explore its influence in more detail later on.

In the meantime, go to the ControlNet Unit 1 tab and drag your image there. Enable the checkboxes for Enable and Pixel Perfect. Choose Control Type - Tile, and Preprocessor - tile_colorfix+sharp.

As we've set all the necessary parameters, we can finally understand what's happening, using grids of course. The X-axis represents the Control Weight of the Tile model, while the Y-axis represents the Control Weight of the Canny model.

_{Denoising Strength - 1.0}

_{Denoising Strength - 0.5}

It's easier to understand what's happening on these grids by examining them yourself than explaining. But it's clear that there's a pattern: the higher the X, the closer the image resembles the reference image; the higher the Y, the more similarity in details. On different reference images and with different Denoising Strengths, the best combinations of Control Weight can vary, but with a high probability, they will be values somewhere in the middle of the grid (0.25-0.75).

It's also noticeable that at high Denoising Strength values on the Y-axis, artifacts start to appear in the images. This could be due to setting an overdetailed contour or the prompt for stylization overpowering the result. To amplify a specific part of the prompt, it's wrapped in parentheses with a strength coefficient, like (anime style:1.5). Let's see how this affects our result on a familiar grid :)

_{X - Prompt Weight, Y - Tile Control Weight}

As you can see, a stronger prompt has a more significant impact on the result, but in certain situations, it can overpower the image, creating artifacts. So, this is also something to consider.

Summary

The four key settings that have the most significant impact on the stylization result, aside from the checkpoint, are:

Denoising Strength,
Tile Control Weight,
Canny Control Weight,
Prompt Strength.

The basic process of stylizing an image includes the following steps:

Start by describing the reference image and configuring the essential parameters, but do not enable ControlNet at this point. Describe the style. Generate images with Denoising Strength from 0.5 to 1.0. This helps you assess the results, determine if you've chosen a suitable checkpoint, and verify if you've accurately described the desired style.
Once you've generated an image with a satisfying style, make the seed constant and enable ControlNet. Adjust the Control Weight within the range from 0.25 to 0.75 and observe how well the style aligns with the reference, how close the similarity is, and if any artifacts appear. If something isn't satisfactory, you can adjust one of the primary parameters accordingly.
Once you've found the optimal parameters, set the seed to -1 and generate the desired number of images.
Once you want to change the style, keep in mind that depending on the style, you may need to increase or decrease the Control Weight. If you want an image in the style of, for example, H.R. Giger, a high Control Weight won't give you a stylistically good result, but with a low Control Weight, you'll lose similarity. Again, you need to find the right balance.

Indeed, once you go through the entire process with a few images, it becomes much easier to intuitively adjust parameter values. Practice and experimentation are key to mastering the art of image stylization and control.

You can also choose a random artist using Dynamic Prompts extension and the list of artists. You can also browse websites like this one and similar ones where styles of different artists are visually demonstrated.

More Examples

_{Style - (anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5), Denoising Strength - 1.0, Canny Control Weight - 0.4, Tile Control Weight - 0.75}

_{Style - (renaissance painting, oil colors, hand drawn:1.5), Denoising strength - 1.0, Canny Control Weight - 0.4, Tile Control Weight - 0.75}

_{Style - (anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5), Denoising Strength - 1.0, Canny Control Weight - 0.4, Tile Control Weight - 0.75}

_{Style - (anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5), Denoising Strength - 0.75, Canny Control Weight - 0.5, Tile Control Weight - 0.75}

_{Style - (anime style, vintage cartoon style, hard lines, bold contours, by mappa studio:1.5), Denoising Strength - 1.0, Canny Control Weight - 0.4, Tile Control Weight - 0.75}

_{Style - (renaissance painting, oil colors, hand drawn:1.5), Denoising strength - 1.0, Canny Control Weight - 1.0, Tile Control Weight - 0.5}

Next - Tips & Tricks ‐ Hidden Stamp

Tips & Tricks ‐ Stylization - InfluxOW/Stable-Diffusion-Text-To-Person GitHub Wiki

Instruments

Basic Generation

Advanced Generation

Summary

More Examples

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️