Aspect Ratio Bucketing - Nerogar/OneTrainer GitHub Wiki

Understanding Aspect Ratio (AR) Bucketing: A Simple Guide

What you have:

  • Training Resolution: 1024 pixels
  • Batch size: 4
  • Pixel Limit (aka Budget): 1,048,576 pixels
  • (Because 1024×1024=1,048,576)

Handling AR’s properly = better image generations. OneTrainer uses Aspect Ratio Bucketing. Here's how it works.

Creating Aspect Ratio Buckets

  • Defines buckets relative to training resolution using all_possible_input_aspects

Reading Each Image

  • The program looks at every image in your dataset and notes down the width and height

Finding the Best Bucket for Each Image

  • For every image, the program figures out which bucket is the closest fit

Adjusting the Image to Fit the Bucket

  • Scaling: If the image is still too big, it shrinks (scales down) the image to fit within the pixel budget.
  • Cropping:
    • If scaling alone cannot make it fit, then it can crop one dimension evenly (width or height). The crop amount is functionally limited by the amount of buckets and the training resolution (we derive the resolution of each bucket from the training res). If the crop jitter augmentation is enabled it will randomly distribute the cropping required in one or more dimensions

In summary we try to make the smallest possible adjustments to the image.

all_possible_input_aspects in (width, height)

  • (4.0, 1.0),
  • (3.5, 1.0),
  • (3.0, 1.0),
  • (2.5, 1.0),
  • (2.0, 1.0),
  • (1.75, 1.0), Common Widescreen (16/9)
  • (1.5, 1.0),
  • (1.25, 1.0),
  • (1.0, 1.0), Square
  • (1.0, 1.25),
  • (1.0, 1.5), Common Portrait
  • (1.0, 1.75),
  • (1.0, 2.0),
  • (1.0, 2.5),
  • (1.0, 3.0),
  • (1.0, 3.5),
  • (1.0, 4.0)

Example ‘Dataset’ of 4 Images

  • 16:9
    • 1280 x 720
    • 1366 x 768
    • 1600 x 900
    • 1920 x 1080

Lets use the 1920 x 1080 as an example

16:9 - 1920 x 1080 (2.073M pixels) Example

  1. Determine AR - divide image width by height = 1.7
  2. Looking at the available buckets, our closet match 1.75:1, however its not an exact match
  3. The image is also over our pixel budget so we proportionally scale down the image to 1365 × 768 (1.048M pixels)
  4. Now to make it fit the 1.75:1 bucket we must reduce its width, so we crop 21 pixels from the width image

Conclusion

  • An aspect ratio bucket is an aspect ratio adjusted to the pixel budget.
  • Images are scaled / cropped to match the closest possible bucket.
  • During training, a batch can only be filled with images on the same bucket. That explains a potential image drop when using a batch size greater than 1 and images on different ratio.