Aspect Ratio Bucketing - Nerogar/OneTrainer GitHub Wiki

Understanding Aspect Ratio (AR) Bucketing: A Simple Guide

Handling AR’s properly = better image generations. OneTrainer uses Aspect Ratio Bucketing. Here's how it works.

Defines buckets relative to training resolution using all_possible_input_aspects

The program looks at every image in your dataset and notes down the width and height

Scaling: If the image is still too big, it shrinks (scales down) the image to fit within the pixel budget.
Cropping:
- If scaling alone cannot make it fit, then it can crop one dimension evenly (width or height). The crop amount is functionally limited by the amount of buckets and the training resolution (we derive the resolution of each bucket from the training res). If the crop jitter augmentation is enabled it will randomly distribute the cropping required in one or more dimensions

In summary we try to make the smallest possible adjustments to the image.

(4.0, 1.0),
(3.5, 1.0),
(3.0, 1.0),
(2.5, 1.0),
(2.0, 1.0),
(1.75, 1.0), Common Widescreen (16/9)
(1.5, 1.0),
(1.25, 1.0),
(1.0, 1.0), Square
(1.0, 1.25),
(1.0, 1.5), Common Portrait
(1.0, 1.75),
(1.0, 2.0),
(1.0, 2.5),
(1.0, 3.0),
(1.0, 3.5),
(1.0, 4.0)

Lets use the 1920 x 1080 as an example

Determine AR - divide image width by height = 1.7
Looking at the available buckets, our closet match 1.75:1, however its not an exact match
The image is also over our pixel budget so we proportionally scale down the image to 1365 × 768 (1.048M pixels)
Now to make it fit the 1.75:1 bucket we must reduce its width, so we crop 21 pixels from the width

An aspect ratio bucket is an aspect ratio adjusted to the pixel budget.
Images are scaled / cropped to match the closest possible bucket.
During training, a batch can only be filled with images on the same bucket. That explains a potential image drop when using a batch size greater than 1 and images on different ratio.