General - Nerogar/OneTrainer GitHub Wiki

In the general tab, you define common settings for OneTrainer and some specific settings like for Validation and Multi-GPU.

Common Settings

Workspace Directory (default - workspace/run): required, this directory will contain the samples generated during a training, the training settings, the backups and tensorboard logs. You can use a single folder or create a folder per project. The second option might be useful if you work on several projects/tries.
Cache Directory (default - workspace-cache/run): required, used during runtime to store your cached images and text.
Continue from last backup (default - off): when enabled it will use the last backup from the selected workspace.
Debug Mode (default - off): switch to debug mode. This will provide you with data and images of what OneTrainer is doing including the images (predicted vs actual) of the training step. Note you can manually enable and disable the debug mode during training.
- Debug images are generated from the .pt (pickled tensors) file. This means they will go through the VAE of the model when they are saved as images.
- As a result, these images can be of lower quality than expected, but are NOT the images before being sent through the VAE.
Debug Directory (default - debug): optional, only required if you switch on the Debug mode.
Tensorboard (default - on): Whether to start Tensorboard when training is started. Tensorboard is used to see statistics of your training runs. All Tensorbaord runs in a current workspace will be available for viewing.
Expose Tensorboard (default - off): This will move Tensorboard from the localhost to be exposed to the rest of the network.
Always-On Tensorboard: This starts the Tensorboard and allow you to open it for a finished training (Tensorboard logs from the current Workspace Directory).
Dataloader Threads (Default 2): Used during caching. Increase it if your GPU can handle it, it will increase the caching time.
Train device (default - cuda): A free text field for choosing the GPU to train on, leave it by default of define the GPU you want to use, example: cuda:1
Temp device (default - cpu): A free text field for choosing where models will reside when not in use. The default is CPU memory (RAM). To disable this option, the train device can be used (i.e. cuda).

Validation

Optional, check Validation Datasets to make the best use of this feature.
Validation (default - off): This will enable validation steps and add a card in the tensorboard if you are using it
Validate after: A free text field to input a number for how often you would like to validate. Units are selected using the accompanying dropdown.

Multi-GPU

It's possible to train on several GPU, for this enable Multi-GPU and define the GPU you want to use in Device Indexes separated by a comma, if empty it will use all available GPU. The order is important in Device Indexes, it will use the first GPU as master, example 0,1 or 1,0.
Sequential model setup: when enabled it will load and setup the base model in each GPU sequentially, slower but save RAM.
Gradient Reduce Precision, Fused gradient Reduce, Async Gradient Reduce, Buffer Size (MB) : Default values are for maximum precision but use the max of VRAM and seconds per operation. If you want to reduce the VRAM usage and accelerate operation, read their tooltips and play around with these parameters.
Note about batch size: The field in the training tab has been renamed to local batch size. When using several GPU, local batches are executed in each GPU then accumulated. So if you plan to train at BS8 with 2 GPU, set the local batch to 4.
Note for Windows users: Windows is less efficient than Linux to deal with several GPU. For Lora and embedding it's not much noticiable but it will suffer a lot with fine tuning. It is highly recommended to use the Gradient Reduce settings for multi-GPU fine tuning on Windows.