Concepts - Nerogar/OneTrainer GitHub Wiki

Concepts Tab Overview

The Concepts tab configures where OneTrainer finds your training data. Concepts include train data, regularization data, or any other data you want to use during training.

Concepts Tab UI

image

The tab comprises the following elements:

  • Dropdown Menu (default: concepts):
    Select among multiple concept configurations. OneTrainer only trains using the currently selected configuration.

  • Add Config:
    Opens an interface to enter a name and create a new concept configuration.

  • Add Concept:
    Creates a new, default blank concept within the current configuration.

  • Delete Concept (red X):
    Removes a concept from the configuration.

  • Duplicate Concept (green plus):
    Copies an existing concept along with all its settings.

  • Enable Concept (toggle, default: on):
    Toggles training for this concept. When enabled, the toggle appears blue.

  • Edit Concept:
    Clicking on any part of the concept image (other than buttons and toggles) opens a settings window for further modifications.

Concepts Settings - General

image

This section covers basic information, balancing, and caching settings.

  • Name (Default: Blank):
    Enter a name for your concept. If left blank, it will default to the folder name provided when closing the window.

  • Enabled (Default: True):
    A toggle mirroring the one on the previous tab (concepts); it controls whether OneTrainer will train using this concept.

  • Path (Default: Blank):
    Enter or paste the path to your concept images. You can also click the (...) button to browse for the folder.

  • Prompt Source (Default: from text file per sample):
    Choose one of three options:

    1. From text file per sample:
      For example, 0001.jpg uses the 0001.txt file as its prompt. Multiple captions (one per line) will allow OneTrainer to randomly select one each epoch.
    2. From single text file:
      Uses one text file as the prompt for all images.
    3. From image file name:
      For example, tag1 tag2 tag3.jpg uses “tag1 tag2 tag3” as the prompt.
  • Include Subdirectories (Default: False):
    Treats images in subdirectories as part of a single concept for easier management.

  • Image Variations (Default: 1):
    Controls how many images will be cached using augmentation variables. This is required when using image augmentations and latent caching.

  • Text Variations (Default: 1):
    Defines how many prompt variations are cached for each image. (Note: When training the text encoder (TE) or embeddings, prompts aren’t cached, so this setting is unnecessary. However if you cache and dont train TE or embeddings, you must increase the number of variations or it will always be the same)

  • Balancing (Default: 1 Repeats):
    Balances the contribution of each concept. This is useful when you have, for example, 10,000 regularization images alongside 100 source images. There are two modes:

    • Repeats (Default: 1 aka none):
      Multiplies the count of source images by the defined value per epoch.
    • Sample:
      Uses an exact number of randomly selected images per epoch regardless of the total available.
  • Loss Weight (Default: 1):
    Adjusts each concept’s impact during training. Lower values can be used for concepts like regularization images if they are overly influential.

Image Augmentation Tab

image

This tab provides options to diversify your dataset through image augmentations, which is particularly important for small datasets. Augmentations can be random or fixed—a setting that requires either disabling latent caching (by caching every epoch) or using image variations.

  • Update Preview:
    Generates a preview of your augmentations. For random settings, repeated clicks will give you a better overall view.

  • Crop Jitter (Default: On):
    If OneTrainer needs to crop an image, this option applies a random, non-centered crop to add variety.

  • Random Flip (Default: On):
    Flips the image along the vertical axis, either randomly or fixed.

  • Random Rotation (Default: Off - 0):
    Rotates the image randomly by up to the specified degree, or uses a fixed rotation if set.

  • Random Brightness (Default: Off - 0), Random Contrast (Default: Off - 0), Random Saturation (Default: Off - 0), Random Hue (Default: Off - 0):
    Each of these settings adjusts the corresponding image property by a random variance or by a fixed value.

  • Resolution Override (Default: Off - 512) - This feature can be used to override the training resolution of the concept. When disabled, One Trainer rescale the images to your training resolution(s). Possible values: either a single resolution or several separated by comma, or a single resolution in the format of width x height. In the case you set several resolutions, the resolution selection is random and you need to increase the image variation. But keep in mind the resolution selection will still be random when caching images, meaning that with 2 resolutions and image variation set to 2, the same resolution can be cached twice. Note also that every image in the concept will have one of the resolutions applied randomly.

    • When activated, it can be used for two purposes:
      • With multi resolution training (several training resolutions separated by a comma), it will use images from the concept of the same resolution.
      • To prevent image upscaling, you can train at 1024 (target resolution) with images of 512 or 256 that won't get upscaled. Training will be done at 512 or 256. It can help with low quality images.

Text Augmentation Tab

Screenshot 2024-11-09 215015

Text augmentations modify each image's caption. A caption is the full text defined by the "Prompt Source" setting on the first tab and is split into "tags" using a custom Delimiter (usually a comma). Each epoch generates a new variation for every image, so using more epochs with fewer repeats/samples per concept maximizes text variation. These variations help prevent overfitting and improve prompting flexibility. Depending on the model, you may achieve better results with either tag-based or natural language captions.

Tag Shuffling

Randomizes the order of tags within a caption. This helps neutralize any unintended importance of tags placed at the beginning. The Keep Tag Count setting ensures that a specified number of tags remain at the front, preserving key "trigger words" (Note these arent a real thing in OT).

Keep Tag Count will specify a number of tags to always keep at the front of the caption. If training a LoRA on a specific concept, it's a good idea keep that concept's name (aka the "trigger word") at the front to have the training focus on it more closely.

Tag Dropout

Randomly removes some tags to help improve performance on shorter prompts. However, dropping too many tags might reduce the model’s ability to distinguish concepts. Both Keep Tag Count and Delimiter settings apply to this process.

The Probability setting (takes values between 0 and 1) controls the chance of dropout, with three modes available:

  • Full:
    Either removes all tags (except those preserved by Keep Tag Count) with the set probability or leaves the caption unchanged.
  • Random:
    Assesses each tag individually for potential removal.
  • Random Weighted:
    Functions like Random but reduces the chance of dropping tags at the beginning (scaling linearly up to the end).

You can also define a list of Special Tags. These tags are either:

  • White-listed: Always kept in the caption, everything else is subject to being dropped
  • Black-listed: Only these tags are subject to dropout, while others remain intact.

Special Tags can be provided as a comma-separated list or as a file path (in .txt or .csv format) with each new entry on a new line. If Special Tags Regex is enabled, you can use regex patterns (e.g., "photo.", "\d.", "d.{1}g") to match related tags.

Note: If a tag contains regex special characters (like . ^ $ * + ? ! { } [ ] | ( ) \), escape them with a backslash (e.g., "Panic\! at the Disco").

Randomize Capitalization

This feature applies various capitalization styles to tags within a caption. Each tag has a chance to receive one of the following styles, based on the defined probability:

  • capslock: All letters capitalized.
  • title: The first letter of every word capitalized.
  • first: Only the first word capitalized.
  • random: Random capitalization for each letter.
    If Force Lowercase is enabled, the entire caption is converted to lowercase before applying any other changes. Note that models using CLIP (such as SD1/2/XL) ignore capitalization changes since they are case-insensitive, while models based on T5 (e.g., SD3, Flux) are affected.

Concept Statistics

This tab displays detailed statistics about your concept’s files and folders. OneTrainer scans the concept folder automatically and saves details to a configuration file. If the contents change, run the scan manually to update the information.

  • Basic Statistics:
    Quickly counts the number of images (and videos), captions, and mask files. It also reports the total file size. This scan is fast and useful for balancing.

  • Advanced Statistics:
    Provides detailed insights such as:

    • Whether images, masks, and captions are correctly paired.
    • The minimum, maximum, and average image resolutions (both in megapixels and dimensions).
    • The length and word count of captions.
    • The distribution of image aspect ratios.
      This scan is more time-consuming, particularly with hard drive storage. Organizing images into subfolders (and enabling "include subdirectories") can improve speed.

Additional details:

  • A "paired" mask or caption shares the same base name as its corresponding image (e.g., "image1.jpg" has "image1-masklabel.png" and "image1.txt"). Unpaired files indicate an issue but do not impact training since they are ignored.

  • Min/Max/Average Pixels:
    Provides precise dimensions. Images significantly smaller than the model's base resolution (512 for SD1, 1024 for SDXL) may lead to artifacts when upscaled.

  • Length (frames) and FPS (frames per second):
    These apply to videos. If a video's fps does not match the target (e.g., 24 for Hunyuan), it may appear too fast or slow.

  • Min/Max/Average Captions:
    Based on character counts, with an approximate word count provided. Since most models work on token counts (roughly 1–2 tokens per word), captions longer than the model's token limit (e.g., 75 tokens for CLIP-based models) will be truncated.

  • Aspect Bucketing:
    Shows the distribution of image aspect ratios. OneTrainer crops images to a set of aspect ratio buckets during training. Batches consist of images with the same ratio, so if there aren’t enough images of a particular aspect ratio to fill the batch, then all those images will be dropped. To avoid this, either adjust the images manually or increase repeats/samples per epoch.