Preamble

In this wiki, show basic steps how to train your own anime waifu or harem.

Still in WIP and many thing need to update, test, and experiment.

Table of Content

Prerequisite
Dataset
- Source Image
  - Good
  - Acceptable
  - Bad
- Preprocess
  - Upscale
  - Downscale
- Crop
  - Good
  - Acceptable
  - Bad
- Stitches
- Bad Dataset
- Colour Correction
Style
- Eye Style
- Art Style
Tagging
- Example
- Dataset Tag Editor
Training
Training in Stages
- TEnc + UNET
Generate CKPT
Multi Concepts ^{create your own harem world}
Troubleshoot
- Over Fitting
- Over Training
- Under Training
- VRAM OOM ^{tldr; buy good GPU at first place}

Prerequisite

Software

Windows 10/11
Stable Diffusion WebUI (AUTOMATIC1111)
Dreambooth Extension
Dataset Tag Editor

Hardware

Decent multi-core CPU (High GHz 4-core minimum)
16GB RAM
Modern Nvidia GPU
10GB VRAM (Windows)
- 8GB VRAM must use LoRA!

Limitation

Nvidia TurboCache

Dreambooth training requires a lot of memory. Linux does not support using a technology called Nvidia TurboCache, which allows using system RAM as a memory buffer for graphics. However, CUDA is able to use ~99% of the VRAM.

10GB VRAM

Training on 10GB VRAM is only works on Windows 10+
Linux users must use LoRA or modify Kernel Mode Set (KMS) to offload some VRAM to System RAM just like Windows.
- Try kill DE to save some VRAM and run via SSH

8GB VRAM

⚠️ Must use LoRA to train!

Dataset

Similar to ELI5 Training with different tweaks

Length

It's important to have a dataset that is large enough for Dreambooth to learn from, but not so large that it leads to over-fitting or over-training. A good rule of thumb is to limit your dataset to a maximum of around 30 images. Additionally, it's important to balance the number of images for each concept in your dataset. If you have a high count dataset for one concept and a low count dataset for another, the high count dataset may overpower or crush the low count dataset.

The length of your dataset for each concept can also greatly affect the number of training epochs needed. If your dataset contains less than 15 images for a particular concept, you may need to train for over 100 epochs to achieve good results. Conversely, if you have more images for a particular concept, you may be able to train for fewer epochs and still achieve good results.

More info and settings can be found here: TEnc + UNET

Source Image

Good Source

Sharp and High Resolution
^{Picture will be down-scaled to 512x512 pixel}
Clear and Clean
^{Waifu must be alone without other character in the frame}
Diversity
^{All waifu activity, location, different background, face expression, from above, below, side, behind ...}
Uncomplicated
^{Avoid rare expression, wearing a mask, glitched, blurry, ...}
Less Close-up
^{Too many close-up will make txt2img more close-up, losing it's diversity}

Acceptable

Text
^{Text on t-shirt, dialog, signboard, try minimise/less of this}
Low Light
^{Avoid low light (dungeon, underworld)}
Too Bright
^{Avoid bright scene (lens flare, god rays over character face)}

Bad Source

Wrong Aspect Ratio
^{Your waifu getting squeezed}
Indistinguishable
^{Repeated frame, same background, same angle}
Multiple Character
^{Other dude getting close to your waifu that can't be crop away}
Background Waifu
^{Your waifu is not main focus and blurred, behind another}
Subtitle
^{Source from burned subtitle, bad screen shoot}

Think of Dreambooth as an employee that is learning from your dataset. Don't give it too complex of tasks or it can result in incomplete training (under-trained), errors and glitches in the training (over-trained), or the model becoming too specific to your dataset and not generalizing well to new data (over-fitting).

Preprocess

Check every picture manually make sure picture fall into Good Source category and few Acceptable!

Upscale

If your source from a screenshot, or low resolution JPEG, you need upscale it first to reduce compression artifacts, using build-in upscaler at Extra tab inside Stable Diffusion WebUI, and choose R-ESRGAN 4x+ Anime6B at 1 to 2

⚠ Using 4 times upscale can lead to thick art line, downscale will be issue!

Downscale

⚠ With latest Dreambooth Extension, you can skip this downscale step, Dreambooth ImageBucket will downscale do it for you automatically and beautifully, if you feel not safe because of these reasons, proceed:

Hide bad upscale
Reduce art line thickness
Eliminate compression artifact
Making it look sharp

Use XnConvert to properly downscale at highest quality. Do not mix Wide and Portrait in the input files, process Wide or Portrait first...

Add action > Image > Resize
Enlarge/Reduce: Always
Resample: Lanczos2 (like 8x anti-aliasing)

Wide Screen

Mode: Height

Portrait

Mode: Width

Crop

With Image Bucket, you can skip this step, let ImageBucket pick, resize and crop automatically. If you not confident with Image Bucket, you can still manually crop by your self.

Valid Resolution

To accelerate training and improve training quality, it is recommended to tightly crop your dataset subject. By cropping out extraneous information from your images, your model can focus on learning the important features of the subject and reduce the amount of noise in the data. This can result in faster and more accurate training, as well as more robust models that are better able to generalize to new data.

Ratio	512	1080p
1:1	512x512	1080x1080
7:8	448x512	945x1080
3:4	384x512	810x1080
5:8	320x512	675x1080
1:2	256x512	540x1080

These crop ratios are for vertical/portrait images and are optimized for the common 1080p resolution used in screencaps.

NOTE

BIRME downscale using Nearest Neighbor Algorithm will cause your picture no longer anti-aliased,
always downscale with XnConvert: XnConvert vs BRIME

⚠️ If you prefer, you can skip the downscaling step and use the original resolution of your images. The ImageBucket Latent algorithm can downscale the images for you while maintaining high quality.

Good Cropping

Waifu is focused and blur background
other char background blured
Tightly Cropping 810x1080 (384x512)
BIRME FIT

Acceptable

Other character have < 5% in the crop area, try make this few in dataset, it's better crop fit to reduce noise and unwanted data
OK MAYBE
50-50

NO!

Character holding an object covering the face
visible eye
Other character too visible in the crop area
another char in frame
Character too close will cause your model lose variation!
Close-up

Bad Dataset

Make sure no bad dataset, having one will cause your final model produce bad results

Finger

finger ded

Too Small

finger bad
⚠ Always remove bad drawing from dataset!

Comparison RAW vs Processed

Versus
RAW	R-ESRGAN 4x+ Anime6B (1X)

⚠ Always preprocess your dataset, especially screencaps

Stitches

Try get many full body as possible, if source image from a screenshot, stitches related image like this: stitches_nagisa

Avoid to have splitting frame, instead try to merge it with gradient to make it look blend: stitches_mahiru

⚠️ This way, Dreambooth able to understand whole character it's uniform, clothes, dress, skirt, etc. Merging become one is highly recommended!

Colour Correction

Most raw screencaps are not ideal for training purposes, so it's recommended to manually check each image and apply Auto Level, Auto Contrast, or both. This will help improve the clarity and distinction between the subject and the background, making the images more suitable for training.

Raw

Raw Screencaps

Auto Level & Contrast

Processed Screencaps

Auto Contrast

Processed Screencaps

ℹ️ You can mix images that have been processed with Auto Level and/or Auto Contrast with raw images in your dataset. This can help Dreambooth learn how to reproduce colours accurately during inference.

Tagging

This very important step, you need describe each picture what is that, manual tagging is preferred, you can use automatic DeepDanbooru or Waifu Tagger, however automatic tagging can lead to false positive

⚠ Also keep tag short as possible.
⚠ Avoid repeated tag: skirt, pleated skirt just pleated skirt
⚠ Incremental tag method will be use.
⚠ Keep common tag to the left!
⚠ Use Danbooru Tag is preferred.

Naming

To systematically organize subject names for training files, it's important to arrange them in a consistent manner, such as starting with the family name first. This will make it easier for the Stable Diffusion algorithm to search for specific tokens inside UNET neural networks, and also ensure that the subjects are properly identified.

Identify Your Waifu

Look around, find what is most common dress that your waifu is wearing and use it as default

Example

tag

Tag Syntax

Arrange your tag accordingly, where most important (character name) at first tag followed by clothing, expression...

Example

Name	Clothing	Face Expression	Action	Body Direction	Camera
gotou hitori	black shirt	frown	standing	facing away	from above
kita ikuyo	blue dress	smile	walking	facing to the side	from below
shiina mahiru	school uniform, blazer	blush	lying	facing viewer	from behind
kubo nagisa	school uniform, cardigan	blush, smile	sitting	facing back	looking at viewer

⚠ Character Recognition

When preparing a dataset for Dreambooth training, you can choose to omit certain information about the characters such as their hair color, eye color, and default clothing. Instead, you only need to include the anime name and background in the dataset.

During inference (txt2img), you simply provide the anime name in the prompt and the model will generate the correct eye color, hair color, etc. based on what it learned during training. This makes the process of generating images more efficient and streamlined, as you don't need to specify every detail about the characters in the prompt.

⚠ Generic Tag

Avoid using generic tag (eg: 1girl) for your waifu, it may lead to over-fitting cause other waifu become trained dataset

⚠ Direction

Anime Stable Diffusion model doesn't understand Left and Right, mind that since Danbooru Tag doesn't have it!
^{It's possible to introduce own prompt for left and right, this training will be a big project and many hours of troubleshooting}

Common Tag

	Camera	Body Direction	Face/Head
⬆️	from above	facing up	looking up
⬇️	from below	facing down	looking down
behind	from behind
side	from side	facing to the side	looking to the side
back		facing back	looking back
another		facing another	looking at another
camera		facing viewer	looking at viewer

You can do this standing, facing away, looking at viewer to make anime character body away while head look at you

More tag can be found at Danbooru

Style

Eye Style

For the best [filewords] for prompting later on is character name, artist name, eye style, where user can use any combination and any style.

Default

⚠️ Default eye style which is no need to mention on [filewords]

default
`saitou yoshiko`	`hanekoto`				`bekkankou`
`kubo nagisa`	`shiina mahiru`	`inoue takina`	`yamano mitsuha`	`sabine`	`sendou erika`

`tareme`

Eyes drawn with the top eyelid slanted outwards, to the point where the outer corner of the eye is much lower than the inner corner. This usually produces a weak, gentle look and is generally given to characters with soft personalities (naturally, exceptions exist).

`tareme`
`arawi keiichi`		`nekotofu`		`hamazi aki`
`naganohara mio`	`aioi yuuko`	`oyama mihari`	`oyama mahiro`	`gotou hitori`	`ijichi nijika`

`jitome`

When the top of the eye is drawn with a flat line. Used to effect listlessness, apathy, or a bored, expressionless, scornful, or smug face.

`jitome`
`arawi keiichi`	`hamazi aki`	`nekotofu`
`minakami mai`	`ijichi seika`	`oyama mahiro`	`nishikigi chisato`

`tsurime`

Eyes drawn with the top eyelid slanting inwards. This usually produces a strong, piercing look and is generally given to characters with forceful personalities (naturally, exceptions exist).

`tsurime`
	`yoshimizu kagami`	`tashiro tetsuya`
`sabine`	`hiiragi kagami`	`akame`

Art Style

Ensure that the artist's name and the art style remain the same throughout your dataset. This will help your model learn to recognize and reproduce the specific characteristics of that style.

To avoid bias and improve the model's ability to generalize, it's important to ensure that characters in your dataset are distinct from one another. If there are multiple images of the same character, try interleaving them with images of other characters to provide a more diverse set of examples for your model to learn from.

artist name	hair length	hair colour	eye colour
bekkankou	long hair	purple hair	purple eyes
bekkankou	long hair	brown hair	green eyes
bekkankou	long hair	blonde hair	blue eyes

Invoking Art Style

With trained model, to apply art style just simply invoke artist name like this in txt2img

Syntax

masterpiece, best quality, highres, game cg, <artist name>, 1girl, <char name>, cherry blossoms, petals, flying petals, wind, upper body

Example

masterpiece, best quality, highres, game cg, bekkankou, 1girl, sabine, cherry blossoms, petals, flying petals, wind, upper body

Results

bekkankou	sabine

This visualization helps illustrate how Dreambooth is used for training and Stable Diffusion is used for inference. Together, these tools help the model understand the prompt and generate outputs that match it.

Dataset Tag Editor

Use Dataset Tag Editor to speed up tagging process, first step to add character name gotou hitori
DSTE

For starting, load dataset, and go to

Batch edit caption
Replace Text
Entire Caption
Search and Replace

All picture now has tag that you set!

Now, manually identify your waifu via Edit Caption of Selected Image, if waifu wearing different than default dress, tell it! Face expression, activity, etc...

Save Tagging

After you done, click Save all changes Each picture will have text file along side:
FileExplorer

Training

Once you satisfy with your tag, can begin training of your favorite waifu!

Create Model

DB Create Model

Set a Name
^{your model/project name, example: UltimateWaifuMk1 Ultimate Waifu Mark 1}
Choose Source Checkpoint
^{Example: Anything-V3.0-pruned-fp32.ckpt}
- - Extract EMA Weights ^{Optional
    some ckpt has contain EMA and non-EMA}
- - Unfreeze Model ^{Optional
    help your training not get replaced with global tag (eg: 1girl) and improve model training, reduce over-training at cost of some VRAM}
Click button

⚠ Checkpoint Merger Model

Avoid using model that has been "Checkpoint Merger" or "Merge Frankenstein", please use original model for better training results.

List known model for anime

WaifuDiffusion
AnythingV3
NovelAI

Settings

Basic

DB Basic

General

If you have 8GB of VRAM:

Use LORA

Intervals

Training Steps Per Image (Epochs)

Value: 80

When training a Dreambooth model for anime characters, it is typically best to train for less than 100 epochs. An ideal number of epochs is around 80, particularly when using a CFG of 12 or higher. This means that the model will repeat the learning process 80 times to refine its understanding of the data and improve its ability to generate new images of anime characters. By training for fewer epochs, you can help ensure that the model converges to a good solution without overfitting to the training data.

Save Model Frequency (Epochs)

Value: 0

By having Dreambooth compile the model every "x" epochs, you can monitor the training process and detect when over-training is occurring. Over-training is when the model becomes too specific to the training data and starts to perform poorly on new, unseen data. By saving the model every few epochs, you can check its performance and make adjustments if necessary. However, saving the model too frequently can be harmful to the lifetime of the SSD (Solid State Drive) on which the model is stored. This is because writing data to an SSD too often can shorten its lifespan, so it's important to find a balance between monitoring the model's progress and preserving the SSD's health.

Save Preview(s) Frequency (Epochs)

Value: 0

Generating a preview at the current "x" epoch allows you to view the results of the training process at that point. Ideally, this should be set to the same value as the "Save Model Frequency (Epochs)" to allow you to easily compare the model's progress. However, it is worth noting that the quality of the preview generated by Dreambooth may not be as good as the results obtained using native Inference Euler A or DDIM. Nevertheless, the general idea or concept of the preview should be enough to give you a rough idea of the model's progress and performance.

Batching

If you have a graphics card with more than 10GB of VRAM (Video Random Access Memory), you can speed up your training process by increasing the values of the relevant parameters. VRAM is a type of memory that is used by a graphics card to store and process visual data, and having more VRAM can allow your training to run faster by giving the model more memory to work with. By increasing the values of relevant parameters, you can take advantage of the additional VRAM and make the training process more efficient.

Batch Size

Value: 2

Gradient Accumulation Steps

Value: 2

Learning Rate

DB LR

Learning Rate

Value: 0.000001

In Dreambooth, anime characters are treated as objects. When training a model to generate images of anime characters, it's important to choose an appropriate learning rate, which controls the speed at which the model updates its parameters during training. A good starting point for the learning rate in this scenario is 1e-6 or 0.000001. This low value helps ensure that the model updates its parameters slowly and steadily, reducing the risk of overshooting a good solution or becoming stuck in a suboptimal one. The precise value of the learning rate will depend on the specifics of your training data and model, but starting with a value of 1e-6 or 0.000001 is a good place to begin.

LoRA (For 8GB GPU)

LoRA UNET Learning Rate: 0.0001
LoRA Text Encoder Learning Rate: 0.00005

⚠ LoRA is new training engine and replace UNET, more testing against anime is needed!

Image Processing

DB Image Processing

Sanity Sample Prompt

Value: masterpiece, best quality, 1girl

Advanced

DB Advance

Tuning

- Use 8bit Adam
Mixed Precision: fp16
Memory Attention: xformers
Step Ratio of Text Encoder Training: 0
Optional: AdamW Weight Decay: 0.01 - 0.005

Concepts

By default, you can use 4 concepts at the time, if you plan to use more than 4 concept, read here Multi Concepts

Directories

Dataset Directory: X:\path\to\your\dataset

Prompts

Instance Prompt: [filewords]
Class Prompt: [filewords] ^Optional
Sample Image Prompt: masterpiece, best quality, 1girl, [filewords]
Sample & Classification Image Negative Prompt:

lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name

Image Generation

Class Images Per Instance Image: 0
^{Value 2 - 10 is good balance to counter over-training and over-fitting}
Classification CFG Scale: 12
^{Most anime model using CFG 12, why not.}
Sample Seed: 420420
^{Can be any number, useful to track sample results}
Sample CFG Scale: 12
^{Same reasons}

Add more concepts with same value.

Savings (optional)

- Save in .safetensors format
  ^{if you plan to share with internet}
- Half Model
  ^{Model in 2GB size, this might help to reduce your SSD write life-time}

Start Training!

Click Save Settings button first!, then click Train

Once training is complete, it will produce a report and sanity check, can start trying. DB Train Done

Training in Stages

Body Parts

If you plan to train your model in stages, it's best to start with the smaller parts of the body such as legs, fingers, thighs, .... Training on these smaller parts first will significantly improve the overall performance of your model without destroying your waifu.

TEnc + UNET

For 10GB VRAM user (RTX 3080), You can train Text Encoder first, then resume training without Text Encoder

	Stage 1	Stage 2
Epoch	35	65
Learning Rate	`0.000002`	`0.000001`
Optimizer	8bit AdamW
Mixed Precision	fp16
Memory Attention	`xformers`
Train UNET	❌	✅
Step Ratio of Text Encoder Training	1	0
Freeze CLIP Normalization Layers	✅
Strict Tokens	✅
Testing Tab
Deterministic	✅	❌
Use EMA for prediction	✅	❌

⚠️ With these settings, your dataset length must be around ~12 to ~20 images for art style and ~20 to ~30 images for character.

Multi Concepts

As issue #916 has been solved, you can use Concepts List to train multiple concepts, this generally better control and it will train in partition.

No need to do Interleaving Dataset 😊

Folder Structure

A well-organized folder structure can make it easier to manage and keep track of your dataset. By having a clear and logical arrangement of your files and folders, you can quickly find what you need and ensure that everything is in its proper place. This guide can help you establish a good folder structure, but you can also choose to disregard it if you have a different approach that works better for you.

[Project Name]
   │
   ├── [Anime]
   │     │
   │     ├── [nishikigi chisato]
   │     │     │
   │     │     ├── dataset_001.png
   │     │     ├── dataset_001.txt
   │     │     ⁞
   │     │
   │     ├── [inoue takina]
   │     ⁞     │
   │           ├── dataset_001.png
   │           ├── dataset_001.txt
   │           ⁞
   │
   ├── [Artwork]
   │     │
   │     ├── [artist style 1]
   │     │     │
   │     │     ├── dataset_001.png
   │     │     ├── dataset_001.txt
   │     │     ⁞
   │     │
   │     ├── [artist style 2]
   │     ⁞     │
   │           ├── dataset_001.png
   │           ├── dataset_001.txt
   │           ⁞
   │
   ├── [CG]
   │     │
   │     ├── [game 1]
   │     │     │
   │     │     ├── dataset_001.png
   │     │     ├── dataset_001.txt
   │     │     ⁞
   │     │
   │     ├── [game 2]
   │     ⁞     │
   │           ├── dataset_001.png
   │           ├── dataset_001.txt
   │           ⁞
   │
   ├── [Parts]
   │     │
   │     ├── [hands]
   │     ├── [feet]
   │
   ⁞
   └── project.json

JSON

The following is an example of a JSON file. By using this file format, you can add as many data items as you need to train your Dreambooth Concept. The idea is to get a basic understanding of how to create and format the JSON file so that you can make use of it in your training process.

⚠️ Please validate your JSON file before using it

[
	{
		"class_data_dir": "",
		"class_guidance_scale": 12,
		"class_infer_steps": 40,
		"class_negative_prompt": "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name",
		"class_prompt": "1girl, bob cut, blonde hair, red eyes, red blazer",
		"class_token": "",
		"instance_data_dir": "E:\\dataset\\_FallenAngel\\Anime\\Nishikigi Chisato",
		"instance_prompt": "[filewords]",
		"instance_token": "",
		"is_valid": true,
		"n_save_sample": 1,
		"num_class_images_per": 0,
		"sample_seed": 420420,
		"save_guidance_scale": 12,
		"save_infer_steps": 40,
		"save_sample_negative_prompt": "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name",
		"save_sample_prompt": "masterpiece, best quality, 1girl, [filewords]",
		"save_sample_template": ""
	},
	{
		"class_data_dir": "",
		"class_guidance_scale": 12,
		"class_infer_steps": 40,
		"class_negative_prompt": "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name",
		"class_prompt": "1girl, long hair, black hair, purple eyes, open jacket",
		"class_token": "",
		"instance_data_dir": "E:\\dataset\\_FallenAngel\\Anime\\Inoue Takina",
		"instance_prompt": "[filewords]",
		"instance_token": "",
		"is_valid": true,
		"n_save_sample": 1,
		"num_class_images_per": 0,
		"sample_seed": 420420,
		"save_guidance_scale": 12,
		"save_infer_steps": 40,
		"save_sample_negative_prompt": "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name",
		"save_sample_prompt": "masterpiece, best quality, 1girl, [filewords]",
		"save_sample_template": ""
	}
]

Troubleshoot

Adjust your total Epochs, Learning Rate and Text Encoder Scale

Prior Loss Weight: 0.25 - 0.35 (0.3) ^{higher the number, more towards to class image instead of dataset}
Class Images Per Instance Image: 1 - 10

Over-fitting

Example of Over Fitting

Overfitting occurs when a model becomes too specific to the training data and does not generalize well to new data.

One way to prevent overfitting is to use a limited number of classes in the training dataset, around 2 to 10 classes, this can make the model simpler and more general, and thus less prone to overfitting.

Solution

Try reduce number of tag ([filewords] token), manual tagging is preferred, keep under 8 tag at most.
Using Class Image and set Prior Loss Weight around .1 to .5 or default .75

Over-training

Example of Over Training

Overtraining occurs when a model is trained for too long, or with too much data, and it starts to perform poorly on new, unseen data. The model has "memorized" the training data and is no longer able to generalize to new situations.

To troubleshoot overtraining, one solution is to use fewer number of epochs during training. An epoch is one complete pass through the entire training dataset. If you use too many epochs, the model may start to memorize the training data rather than learn the underlying patterns.

Another solution is to increase the learning rate. The learning rate controls how quickly the model updates its parameters during training. If the learning rate is too low, the model may take too long to converge and overtrain.

You can also reduce the text encoder scale, because the text encoder is used to generate captions for the generated 3D models of people. If the text encoder is too complex, it may overtrain and produce errors and glitches.

In summary, overtraining occurs when a model is trained for too long, or with too much data, and it starts to perform poorly on new, unseen data. To troubleshoot overtraining, you can use fewer number of epochs during training, increase the learning rate, or reduce the text encoder scale.

Solution

Try reduce number of Epochs, less than 100 (default value)
Increase Learning Rate

Under-training

Undertraining occurs when a model is not trained for long enough or with enough data, and it is not able to capture the patterns in the data. This can result in poor performance on the task at hand, such as producing nothing of your dataset or incomplete or wrong picture.

To troubleshoot undertraining, one solution is to resume training for the same number of epochs. An epoch is one complete pass through the entire training dataset. By resuming training for the same number of epochs, you are giving the model more opportunities to learn the patterns in the data.

Another solution is to reduce the learning rate. The learning rate controls how quickly the model updates its parameters during training. If the learning rate is too high, the model may not have enough time to converge and undertrain.

You also can increase the text encoder scale, because the text encoder is used to generate captions for the generated 3D models of people. If the text encoder is too simple, it may undertrain and produce nothing of your dataset or incomplete or wrong picture.

In summary, Under-training happens when a model is not trained for long enough or with enough data, and it is not able to capture the patterns in the data. To troubleshoot under-training, you can resume training for the same number of epochs, reduce the learning rate, or increase the text encoder scale.

Solution

Resume training at lower epochs (20).
Increase number of epochs, > 50.
Decrease Learning Rate

VRAM OOM

When training deep learning models, it is important to have enough memory (VRAM) on your GPU. This is because the model needs to store all the weights and intermediate computations during the training process.

Having a GPU with 12GB or more of VRAM, such as the RTX 3060 12GB, RTX 3080 12GB, RTX 4080 16GB, can help prevent issues such as out of memory (OOM) errors, under-training, over-training, and over-fitting.

Solution

Use LoRA
Train in Stages for 10GB VRAM user
Buy new Nvidia GPU that has more than 12GB VRAM!

Anime Training - Anime4000/sd_dreambooth_extension GitHub Wiki

Preamble

Table of Content

Prerequisite

Software

Hardware

Limitation

10GB VRAM

8GB VRAM

Dataset

Length

Source Image

Good Source

Acceptable

Bad Source

Preprocess

Upscale

Downscale

Wide Screen

Portrait

Crop

Valid Resolution

NOTE

Good Cropping

Acceptable

NO!

Bad Dataset

Finger

Too Small

Comparison RAW vs Processed

Stitches

Colour Correction

Raw

Auto Level & Contrast

Auto Contrast

Tagging

Naming

Identify Your Waifu

Example

Tag Syntax

Example

⚠ Character Recognition

⚠ Generic Tag

⚠ Direction

Common Tag

Style

Eye Style

Default

tareme

jitome

tsurime

Art Style

Invoking Art Style

Syntax

Example

Results

Dataset Tag Editor

Save Tagging

Training

Create Model

⚠ Checkpoint Merger Model

List known model for anime

Settings

Basic

General

Intervals

Training Steps Per Image (Epochs)

Save Model Frequency (Epochs)

Save Preview(s) Frequency (Epochs)

Batching

Batch Size

Gradient Accumulation Steps

Learning Rate

Learning Rate

LoRA (For 8GB GPU)

Image Processing

Sanity Sample Prompt

Advanced

Tuning

Concepts

`tareme`

`jitome`

`tsurime`

⚠️ GitHub.com Fallback ⚠️