Basics

Use the toolbar at the top of the Main Window to select tools and toggle windows.

Main Window

There's a burger menu (☰) in the top left corner of the Main Window:
- Shows available keyboard shortcuts (global shortcuts that work in all windows).
- Use the Model Settings to configure your models.
- Select which files types to load (images/videos).
Load files or folders by dragging them into the Main Window.
- Hold SHIFT while dropping to append files.

Image Viewer

Zoom in and out with the mouse wheel.
Pan the image by dragging with the mouse.

These controls are available in all tools, but some tools may replace them with optimized controls.
For example, the Crop Tool uses the mouse wheel for adjusting the crop region instead of zooming.

In these cases, you can hold CTRL to switch back to zooming and panning.

Reset the view (zoom and pan) with the 0 key.

To navigate between images, press the LEFT or RIGHT arrow key, or the forward/backward mouse button.
Pressing the UP and DOWN arrow keys will navigate to the next or previous folder.

When another window is focused, like when editing captions in the Gallery's list view, you can change the image with Ctrl+PageDown and Ctrl+PageUp.

Video Player

The zoom and pan controls for images also apply to videos.

Press SPACE to toggle play/pause.

Seek Bar

For videos, the Main Window will additionally show a seek bar at the bottom.
Move the mouse to the lower part of the window to expand the seek bar. It will show thumbnails for the hovered timestamp.

Left click on the seek bar to jump to that timestamp in the video.
Middle click on the seek bar to toggle play/pause.
Use the mouse wheel on the seek bar to skip 5 seconds forward/back.
- Or if the video is short, skip max 20% of the duration.
- While paused, it will skip one frame.

Right click on the seek bar to open a menu with:

Keyframe navigation (these might have better quality and sometimes align with scene transitions)
Playback speed control

Volume Control

While moving the mouse in the Main Window, a volume symbol will briefly appear above the seek bar.

Use the mouse wheel on the volume symbol to adjust the volume.
Click on the symbol to toggle mute.

If the video has no audio track, the volume symbol won't show.

Disable playback to save VRAM

Playing videos will allocate VRAM for buffers even when hardware acceleration is disabled.
To save VRAM for loading captioning models, you can disable video playback in the Main Window's menu.

When playback is disabled, it will only show single extracted frames. Seek bar navigation behaves the same as when paused.

Tools

In addition to the buttons in the tool bar, you can also switch between tools by pressing Ctrl+1 - Ctrl+7.

Slideshow Tool

In this viewing mode you can navigate through images with the mouse wheel.

Press SPACE to start or pause the automatic image changing.

Measure Tool

Right click starts and freezes the measurement.
Status bar at the bottom of the Main Window displays the values.

Useful for estimating:
- Mask grow/shrink amount
- Blur radius/kernel size
- Area threshold for the "Filled Area" mask operation
- ...

Compare Tool

Load a second image

by dropping it onto the right side of the Main Window,
or by clicking on an image in the Gallery while holding the ALT key.

The two images are displayed on top of each other. Use the mouse to move the dividing line. The left side of the line shows the current image.

Enable the difference overlay to see which regions are different between the two images. Black regions are identical, bright regions are different.
If the two images have different aspect ratios, the overlay is restricted to the intersection. Exporting the overlay will only save this intersection.

VAE Reconstruction

You can also load a VAE (Variational Autoencoder) model and process the current image with it: The image will be encoded into latent space and then decoded back to pixel space.
This reconstruction shows how a model sees the image during training. If your model struggles to learn certain details, use the VAE to check if those details can even be represented in latent space.
Newer VAEs (Flux, Qwen) are very accurate, but the SD1.5 or SDXL VAE are rather bad at encoding fine details.

Some models reuse an existing VAE from another model. For example, Chroma, HiDream and Z-Image all use the Flux.1 VAE.

VAEs come in different formats. While most image generation tools include their own logic to deal with different names for the contained layers, qapyq simply uses the diffusers model loader for now.
If the VAE fails to load, you can try the VAE from the original model's repo on huggingface.

Crop Tool

With this tool you can select a region of the image and save that region to a new file, scaled to the selected target size.

Quick instructions:

Load a bunch of images.
Set the export path by clicking on the path preview in the bottom right.
- The preview shows the destination path.
- When overwriting files is enabled, the preview will turn red when the file already exists.
- When overwriting is disabled, an increasing counter (_001) is appended to the filename.
Set the target size at the top of the toolbar.
- Exported images will be scaled to exactly this size.
Use the mouse to select the crop region in the image.
Press the left mouse button twice.
- The first click fixes the selection, the second click confirms it.
- To reset the selection, left click outside of the selected region, or use the "Reset Selection" option in the right-click context menu.
- A green flash effect confirms that the export has started.
- The status bar will show a notification when the export has finished.

Toolbar

The toolbar on the right side of the window shows the target size at the top. This is the exact export size.
Using the list of size presets (labeled as "Pre:"), you can quickly change the target size. To edit the presets, open the list and select Setup Sizes... at the bottom.

Below that, it shows the selection size, which is the size of the currently selected region in the image. This region will be scaled to the target size, and the green or red number with the triangle shows the scaling factor.
Downscaling to a factor below 1 is recommended, because it involves less quality loss. Upscaling images results in worse quality and is therefore displayed as red.

To prevent upscaling, the selection size will have a minimum that equals your selected target size.
You can enable Allow Upscale to further shrink your selection.

When Constrain to Image is enabled, which is the default, the selection cannot exit the image boundaries.
When it is disabled and outside regions are selected, they will be exported as transparency if the format allows for that, or black otherwise.

The rotation can be adjusted in 90° steps through the shortcut buttons. Alternatively, you can adjust it in finer steps of 3° with the mouse wheel over the slider, or in finest steps of 0.1° using the mouse wheel over the spin box that displays the number.

The toolbar has a section "Export Settings" which is hidden by default. Click on that title to expand the section. There, you can select a scaling preset which defines the interpolation method used when exporting images.
The setting for "Path" can be changed to "Dialog", which will ask for the path before saving each image.
Read more about it here: Image Export

After an image was exported, you can open it in a new tab using the Open Last File button. This is useful for example if you want to create a mask for it, though, often it's faster to create all the masks together after cropping.

Controls

Use the mouse to select the crop region in the image. Click two times to export the image.

Adjust the selection size with the mouse wheel:

Hold SHIFT to adjust in 1px steps.
Hold CTRL to pan and zoom instead.

After fixing the selection by clicking with the mouse once, the selection rectangle can be adjusted:

Arrow keys: Move the position of the selection.
CTRL+Arrow key: Grow the selection in the direction of the arrow key.
ALT+Arrow key: Shrink the selection from the direction of the arrow key.
Hold SHIFT to do the above in bigger steps.
Click on the selection or press Ctrl+E to export the cropped image.

Pressing the right mouse button will open a context menu for quick access to size presets.

Pressing the middle mouse button (or clicking the Swap button) will swap the target size from vertical to horizontal or vice versa.
When another size preset is selected, this orientation is kept.

Video Cropping

When cropping videos, you can choose to export single frames or video clips:

To export a single frame as image, the destination path has to end with an image extension (.png, .jpg, etc.).
To export videos, the destination path has to end with a video extension (e.g. .mp4)

Export single frames from videos

Select the crop region with the mouse. When you click, the video will pause on the exact frame that will be exported.
You can then use the mouse wheel on the seek bar at the bottom to change the frame but keep the selected crop region.

Right click on the seek bar to open a menu for keyframe navigation. Keyframes might have better quality and they sometimes align with scene changes.

After choosing the frame and fixing the selection, click again or press Ctrl+E to export the image.

Export video clips

Choose the crop region like you would do for images. When you click, the current frame is selected as the starting frame. It fixes the time range and shows a green rectangle on the video seek bar at the bottom. This is in addition to the crop region, which means you can crop and trim videos by spatial size and temporal position/duration. It will also apply the selected rotation.

After fixing the selection, the playback will loop the selected time range.
When pausing the video, it will reset the position to the beginning of the selected time range.
You can skip to any position within this time range without clearing the selection. When you manually skip outside of the range, the selection is cleared.

While paused, you can use the mouse wheel on the seek bar to skip single frames. The video will loop, thus skipping one frame backwards from the first frame will skip to the last frame instead. This provides a preview of the exported clip, though it's not 100% frame-accurate.

Hold CTRL while using the mouse wheel on the seek bar to move the selected time range.
While paused, this will move the time range by one frame.

Right click on the seek bar to open a menu for keyframe navigation. Use the buttons to align the start of the selected time range with keyframes.

After choosing a video extensions for the export path, the toolbar will show additional settings for selecting the exported time range:

Time Range
- Len: This is the length of the exported clip in frames.
  - The label to the right shows the duration in seconds which depends on the chosen FPS and speed (see below).
  - You can create size presets that include this length, e.g. 1280x720x49 where 49 is the frame count.
- Clicking the Set: End Frame button will skip the video back in time by the displayed duration in seconds.
  - When you then click and fix the selection, the selected time range ends at the position where you clicked this button.
  - Use this button to choose the last frame for the exported clip.
- Change Speed: Apply the playback speed to the exported clip.
  - When enabled, the time selection rectangle will expand or shrink depending on the speed setting in the seek bar's right click menu, while the number of frames stays constant. This can help to fit a bit more content into a video clip so it captures the full motion.
  - When disabled, the speed setting only affects the preview. Increasing the speed can save time when preparing clips, but the exported video will have the original speed.
Export Settings
- FPS: The frames per second of the exported video.
  - If the target FPS is larger than 133% of the effective source FPS (= video FPS * speed), it will use ffmpeg's minterpolate filter to interpolate frames. This can take a minute for short clips and easily hours for whole movies (don't do that), but it's worth it for training. Otherwise it would train on duplicated frames which could introduce stutter in the generated videos.
- The scaling preset is ignored when exporting videos. It will always use ffmpeg's Lanczos filter to scale the videos.

Scale Tool

Right clicking onto the image or pressing Ctrl+E saves it at the defined destination.

The different scaling modes define the target size based on the original image size:

Fixed
- Scale to exactly this size. May change aspect ratio.
Fixed Width/Height
- Scale to a fixed width/height and calculate the other side based on aspect ratio.
Fixed Smaller/Larger Side
- The shorter/longer side is set to the specified length. The other side is calculated based on the aspect ratio.
Factor
- Multiply both, width and height, by this factor.
Pixel Count
- Scale to a total number of pixels while preserving the aspect ratio.
Quantized
- Scale both sides by a factor and make them a multiple of Q. This may change the aspect ratio.
- Closest/Wider/Taller defines in which way the size is rounded.

The toolbar and its export settings are otherwise similar to those in the Crop Tool described in the last chapter.

Mask Tool

This tool is for creating and editing greyscale masks for the loaded image. The mask can consist of up to 4 layers, which are exported as the RGBA channels of the resulting image.

Upon loading or changing the image, it attempts to load an existing mask using the defined path template.

Changes to all the mask's layers and their history are kept in memory even when selecting another image. New or loaded masks are not stored, but editing a mask will use additional memory. Although the history is compressed, this may still consume a significant amount of RAM when many masks are edited.
Reloading the mask using the Reload button, or (re)loading a folder will clear the cached masks and history.

The Export button in the bottom right will turn red when changes are made.
To save the current layers, click this Export button or press Ctrl+E.

Operations

The following operations are provided for editing. If not specified, a left or right click into the image will apply the operation.

Basic Operations:

Brush
- Draw brush strokes while holding left mouse button.
- Erase while holding right mouse button.
- Adjust brush size with mouse wheel. Hold SHIFT to adjust in steps of 10.
- Supports pressure-sensitive pens and graphics tablets (pressure adjusts color).
Rectangle
- Draw a rectangle while holding left mouse button.
- Erase: Draw black rectangle while holding right mouse button.
Flood Fill
- Fill region with the selected color (left click).
- Fill region with black (right click).
Clear
- Reset whole layer to the selected color (left click).
- Reset to the inverse of the selected color (right click).
Invert
Threshold
- Pixels above the selected color are set to 1.0.
- Pixels below the selected color are set to 0.0.
Normalize
- The range of values is adjusted to fit into the selected min and max color.
Quantize
- Applies the selected mode to blocks of the mask and creates a coarse, pixelized grid.
- This is supposed to make drawing masks more predictable with regards to the VAE compression, but the training tool has to handle it using the right interpolation method (nearest) for scaling.
Morphology
- Grow (dilate), Shrink (erode), Close Holes (closing), Open Gaps (opening)
- The border setting defines how pixels outside the image are interpreted.
  - Reflect: fedcba|abcdefgh|hgfedcb
  - Replicate: aaaaaa|abcdefgh|hhhhhhh
  - Const Black / Const White treats outside pixels as 0.0 and 1.0, respectively.
Gaussian Blur
- Softens the layer
- Direction "outwards" keeps bright pixels bright.
- Direction "inwards" keeps dark pixels dark.
Blend Layers
- Blends another layer into the currently active layer.
- Most of the modes are commutative, meaning the result is the same whether A is blended into B, or B into A.
  - The exception is "Subtract", which subtracts the source layer from the active layer.

Generate:

Detect Padding
- Detects padding or borders in the image and fills these regions with the selected color in the mask.
- The padding regions in the image need to have a similar color, e.g. black. The tolerance setting can be increased if the colors are too different.
- The min and max color settings define the range of colors in the image that can be detected as padding (the image is converted to greyscale for the detection). Use this to only detect black borders and ignore white borders, for example.
- It only works for straight padding at the top/left/right/bottom. It can't fully detect diagonal padding for rotated images.
Centroid Rectangle
- Creates a maximally enlarged rectangle around the centroid of white pixels in the mask.
  - Helps with cropping images to new aspect ratios while keeping the region of interest (use a detection model).
- The rectangle will have the defined aspect ratio. Its orientation depends on the orientation of the image.
- All pixels above the value 0 are used to calculate the geometric center.
- If the mask is completely black, the rectangle is placed at the center of the image.

Conditions:

Condition: Color Range
- Retrieves the minimum and maximum color values from the mask layer and compares them to the specified range.
- Sets the whole layer to white if the values are inside the range.
- Sets the whole layer to black otherwise.
- Use this in macros to, for example, conditionally invert an empty black input layer to white when some images were manually masked and others were not.
Condition: Filled Area
- Counts the non-zero pixels in the mask layer and compares the area (in percentage) to the specified range.
- Sets the whole layer to white if the area is inside the range, to black otherwise.
- Can be used in macros to discard wrong detections. Florence2 will always return a result, even when the prompted object is not present. Typically, the wrong box will have a large area.
- Use the Measure Tool to estimate the area percentage.
Condition: Region Count
- Retrieves the number of connected white regions from the mask layer and compares it to the specified range.
- Sets the whole layer to white if the values are inside the range, to black otherwise.
- Use this together with "Detect" and "Blend Layers" in macros to, for example, conditionally skip images with multiple people. Or to only process images where the head and feet are detected.
- All pixel values above 0 will connect regions. If your background is not black, you may need to binarize your mask first using "Threshold".
- You can copy the layer with the detections to a new layer (Blend Layers - Add), do the processing there and evaluate the condition, then multiply the result back into the original layer.

User-Defined:

Detect
- Run a model that detects boxes (e.g. face detection).
Segment
- Run a model that segments the image with pixel-accuracy (e.g. masks foreground and removes background).
Run Macro
- Replay a previously recorded macro.

History

A history of operations is kept per layer.
When the history is spun back and a new operation is applied, the following history entries are deleted.

Up to 20 steps can be undone. The maximum history length can be configured in qapyq_config.json by changing the mask_history_size setting.

Macro Recording

Macros are sets of operations that describe how a mask is formed. They serve as the foundation for Batch Masking and Cropping.

After clicking the Start Recording button, all subsequent operations are recorded into a macro.
Undoing operations through the history will also remove them from the recording.

But note that your macro can break if you undo operations in a layer after it was blended into another layer.
As a simple rule: Don't undo operations in layers that were blended into others.

Clicking "Stop & Save" will open a file save dialog to name the macro. This file name will also be the display name.
When the save dialog is cancelled, the recording is paused and can be continued.

Macros must be saved in the qapyq/user/mask-macros/ folder (or subfolders) to be available for selection.

Adding, deleting and changing layers are also recorded. Combined with the "Blend Layers" and condition operations, this allows for powerful computations.
Be careful with the amount of expected input layers. If you start recording with more than one layer, the macro might not work properly with Batch Masking/Cropping.

It is currently not possible to record the "Run Macro" operation into a macro.

Image Export

Path Settings

When saving images from the Crop/Scale/Mask tools, the destination path can be set in two ways:

Path
- Use the template to define a dynamic path.
- The file extension used in the path template defines the export format.
- When overwriting is disabled, an increasing counter (e.g. filename_001.ext) is appended to the filename.
- When overwriting is enabled and the file already exists, the preview at the bottom of the toolbars will show the path in red.
  - In this case, files are overwritten! It will not ask for further confirmation!
Dialog
- Ask for the save location each time.

The default export path can be changed in the qapyq_config.json file using the path_export setting.
When using a relative path in the template, it will be based on this setting.

The following variables can be used to define a dynamic path. This info is also shown in the application when editing the path template. Functions and variables for referencing entries from the json/txt files are also available. See Templates for more information.

    {{path}}      Image path
    {{path.ext}}  Image path with extension
    {{name}}      Image filename
    {{name.ext}}  Image filename with extension
    {{ext}}       Extension
    {{folder}}    Folder of image
    {{folder-1}}  Parent folder 1 (or 2, 3...)
    {{folder:/}}  Folder hierarchy from given path to image
    {{w}}         Width
    {{h}}         Height
    {{region}}    Crop region number
    {{date}}      Date yyyymmdd
    {{time}}      Time hhmmss

Examples:

    {{name}}_{{w}}x{{h}}.webp
    {{path}}-masklabel.png
    /home/user/Pictures/{{w}}x{{h}}/{{folder}}/{{name}}_{{date}}.jpg
    {{name}}_{{tags.tags#replace:, :_}}.{{ext}}

Scaling Presets

The Crop and Scale Tool allow choosing a scaling preset to be used when exporting images.
In addition to defining interpolation methods for up and downscaling, the presets can define which AI model is used for upscaling (optional). Use the Model Settings to create and edit presets.

The presets allow using different models depending on the scaling factor. For each of the three upscale levels you can define a scaling factor as a threshold. When upscaling by a factor larger than this threshold, it will use the associated model.
When the first level has a scaling factor above 1.0 (1.25 for example), it will use the defined upscaling interpolation up to that threshold.

Since the models have a fixed scaling factor, the output images are then further scaled to your selected target size. For this resizing, it will also use the selected interpolation method. This means when you want to scale an image by 1.5 and use an AI model that scales by 2.0, the model's output is downscaled using your selected downscaling interpolation.

AI scaling has not been implemented for batch cropping yet.

Image Formats and Quality

Anti-Aliasing

When downscaling images to less than half their original size, aliasing or moiré artifacts may appear when the smaller size can't resolve fine details. Anti-aliasing aims to prevent that by slightly blurring the image and thus remove details.

qapyq implements an adaptive anti-aliasing mode that only blurs aliasing-prone regions while leaving other parts sharp. It results in visibly more details, sharper textures, but it also retains more noise and is slower.

Interpolation

Scaling and rotating are performed in a single transformation to minimize quality loss. Interpolation is only applied once.

Linear, Area, Cubic or Lanczos are viable for downscaling when anti-aliasing is enabled.
- I recommend Cubic for sharper results or Linear/Area to remove a bit more noise.
Lanczos is great for upscaling.

Cubic and Lanczos have a sharpenig effect and might produce ringing artifacts around high contrast regions. Lanczos more than Cubic.

When Area is selected for downscaling, adaptive anti-aliasing is disabled. However, Area is only used in the Scale Tool/Batch Scaling and with a rotation of 0. When cropping or scaling with rotation, it will always use Linear instead of Area (affine transformations in OpenCV don't work with Area).

Formats

For writing images, qapyq uses the Pillow library with an additional JXL plugin. All of Pillow's export formats can be chosen by using the respective file extension, but only the formats listed below use custom settings. For a full list of supported formats and their default settings see the Pillow Documentation.

JPEG is a lossy format. Quality is set to maximum and chroma subsampling is disabled.
JXL export uses lossless compression.
PNG is a lossless format. Its compression is set to the maximum (slower saving but smaller files).
TIFF export uses lossless tiff_lzw compression.
WEBP export uses lossless compression.

PNG, WEBP and JXL support transparency (alpha channel). The color channels are preserved.

Altough PNG is a widely supported format, it's very slow for saving RGB images. Consider using another lossless format for saving images, if your training software allows.

Color Profiles

Images with a color profile different from sRGB are converted to sRGB when loaded for display and inference.
When saving images, no color profile is written. This is usually interpreted as sRGB and maintains compatibility with training tools.

Auxiliary Windows

qapyq's interface is split into multiple windows which can be placed on multiple monitors.
These windows all depend on the currently loaded and displayed image in the Main Window. When the image changes, so does the content in the auxiliary windows.

Toggle windows by clicking the respective button in the tool bar, or by pressing F1 - F4.

Gallery

The Gallery Window shows thumbnails of the currently loaded images. The images listed in the Gallery are the files that will be processed with the Batch Window.
When a thumbnail is clicked in the gallery, the image is loaded into the Main Window.

Thumbnails are generated when opening the Gallery, or when new files or folders are loaded. They are stored only in memory and are not cached to disk.

At the top of the window, the current folder is displayed and updates as you scroll through the Gallery. Clicking on it lists all loaded folders, and selecting a folder scrolls to its respective images.
The folder paths are shortened: If they have a common root at the beginning, these similar parts are hidden.

These icons can appear on the thumbnails. They show the status and availability of the respective data:

Caption
- White if caption file exists. Red when changed. Green when changed and saved.
Mask
- White if mask exists. Red when changed. Green when changed and saved.
Crop State
- Green if the image was cropped and saved.

Note

The icons can sometimes be inconsistent. Don't overly rely on them.

The Gallery supports two viewing modes, which can be changed at the bottom of the window:

Grid View
List View with editable captions

Captions in Gallery

Select the source of captions in the top right corner of the window. Then reload the texts by pressing the reload (↻) button.

Captions are automatically reloaded when changed with the Caption Window. But for performance reasons, the colored text highlighting is not always updated when editing the rules/groups. Press the reload button to manually reload the colors.

Grid View

In Grid View, the captions have to be manually enabled and are shown in place of the filenames. If the text exceeds the maximum height, it is truncated and "..." will show at the bottom.

Activate the "Filter" checkbox to process the captions with the current settings from the Caption Window (if it's open): Only tags from visible groups are shown, all other tags are hidden. Use the filter text field in the Caption Window's Groups tab to hide groups and restrict the displayed tags.

Activating the filter will also apply the current rules, but without adding the prefix and suffix.
This filter is helpful when editing captions for a certain aspect. The Gallery then provides an overview of existing tags and shows where tags are missing.

List View

In List View, captions are always enabled and can be edited in-place. The text fields support navigation between tags with Alt+Arrow keys. Pressing Ctrl+S or the Save button will save the text to the selected destination (as defined in the top right corner of the window).
You can use Ctrl+PageDown and Ctrl+PageUp to navigate to the next/previous caption.

Folder Menu

The images in the Gallery are divided into a section for each folder. Each folder header shows its name at the left, and the number of images at the far right side.

Clicking on the image count or menu icon inside the folder header opens a menu with actions for the folder's images:

Select images
Open Files in New Tab
Unload Files

The folder headers can be disabled by activating "Sort" and unchecking "By Folder".

Image Selection

Multiple images in the Gallery can be selected. Right-clicking on one of the selected images will open a menu:

Clear Selection
Open Selected Files in New Tab
Unload Selected Files

Selected images are shown with a dashed border. The active image, which is displayed in the Main Window, has a solid and thicker border.
The active image is always part of the selection, and the Gallery's status bar shows the number of selected images.

While multiple images are selected, all methods for navigating between images (arrow keys, auto skip, slideshow, etc.) will only move to selected images.

Most notably, when selecting multiple images, the Caption Window will automatically switch to Multi-Edit mode where the combined captions of the selected images can be edited simultaneously. In this mode, the Gallery will additionally highlight images which contain the selected tag from the Caption Window.
See Multi-Edit Mode for more information.

To select images, these options are available:

Hold the left mouse button and drag the mouse over multiple images in the Gallery.
Hold CTRL and click on an image to toggle its selection.
Hold SHIFT and click on an image to select a range of files, beginning at the active image.

For deselecting images:

Hold CTRL+left mouse button and drag the mouse over multiple images.
Hold CTRL and click on an image to toggle its selection.
Hold CTRL+SHIFT and click on an image to deselect a range of files, beginning at the active image, but excluding it.

To clear the selection:

Double-click on an image.
Use the "Clear Selection" option in the right-click context menu.
Select an image that is not part of the selection.

The active image can be changed to any other selected image without clearing the selection.

Clicking on an image while holding the ALT key will open the Compare Tool and load the image to the right side.

Semantic Sort

The thumbnails in the Gallery can be sorted by the image contents and their similarity to a prompt. This can help with finding similar images that need similar captions. Then, you can select them all and use Multi-Edit to add a tag to all captions. It's also helpful for filtering images during acquisition.

To enable semantic sorting, you first have to download and setup an embedding model. The chapter Embedding Settings explains the processing settings.

When clicking the Sort toggle button in the bottom right corner in the Gallery Window, another panel appears with options for semantic sorting:

Embedding model selection
Toggle button for ascending order (∇)
Positive and negative prompt
Checkbox for disabling folder-grouping

Images that match the positive prompt are sorted to the top. Use the negative prompt for aspects which you don't want at the top.
Every word affects the meaning of the prompt. Be specific with aspects that you're actually looking for. But use less specific, neutral terms for aspects which only add context but shouldn't influence the meaning. For example, use "person" instead of woman/man, if the gender is irrelevant and you only want to sort by perspective.

You can sometimes improve the results with a neutral negative prompt. For example:

Positive: a person seen from behind
Negative: a person

You can combine multiple positive or multiple negative prompts by using the | character as the separator in the text:
a person seen from behind|a sitting person|indoors

Press Enter after writing a prompt to update the sorting. When images are sorted for the first time, it first needs to create the embeddings. You can speed up this process by using multiple hosts, even when running locally. It will take more VRAM, but the embedding models aren't very large. See the Remote Inference chapter for how to setup hosts.

When right-clicking on a thumbnail in the Gallery, you can choose to sort all images by their similarity to the selected file(s).

The images are grouped into folders by default. You can uncheck the By Folders checkbox to disable the folders and sort all images instead.
Unchecking it also works without prompt and the images will be shown next to each other without folder headers.

Stats

The Stats Window provides summaries of your loaded images in sortable tables. After loading the data, you can select rows in the table and the associated images will be listed on the right. In the case of tags for example, each row is a tag, and the listed files are the images with captions containing that tag.

Rows can be filtered using the text box to the left. It supports regex.
To filter for multiple words at once, write it like this: tag1|tag2 (no spaces)

Hold CTRL while clicking to select multiple rows. Above the list of files you can choose how to combine the selected rows:

Any (Union)
- Lists images which are associated with at least one selected row.
One
- Only lists images which are associated with exactly one row.
Multiple
- Only lists images which are associated with more than one row.
All (Intersection)
- Only lists images which are associated with all selected rows.

The Negate checkbox will invert the list and display all images which would otherwise be hidden if the checkbox was unchecked.
In the case of tags for example, selecting one tag and negating the list will show all images WITHOUT the tag.

The With Files... button shows actions for the listed images. Most notably, you can open them in a new image tab in the Main Window.
The new tab will only contain the filtered images. Batch processing, the gallery, and also the Stats Window will only handle these filtered images.
This also allows you to chain filters and apply a further restricted selection in the new tab.

Note that each tab has its own state for all the windows and tools.

Tables

Tag Count

Use the selector at the top to change the data source. The captions are split by all defined separators (use \n to split lines).

Additionally, combined tags can be split, so for example black denim pants would be listed as two rows: black pants and denim pants. This is useful if you have manually edited captions or some with rules already applied.
The groups in the Caption Window define which tags are split, and it will only work if the tags fully consist of words which are part of the group. If extra words are present, the tag is listed as-is, without splitting.

This table will show all tags with their total count. It uses the colors of the rules and groups in the Caption Window.
Right-clicking on a row will open the context menu from where you can add the tag to the caption in the Caption Window, to groups, focus or bans. The menu also provides shortcuts for Batch Processing, to add/remove/replace tags in all loaded files.

JSON Keys

This table will show the names of all existing keys found in .json files.
This is useful together with Negate to find images which have no caption for example.

Image Size

This table groups the images by their size.

This is useful for estimating the size buckets for training, and to find buckets which lack images to fill a batch.
It can also be used for filtering out low-res images during image acquisition.

Mask

Masks are loaded using the defined path template. Multiple modes for the stats are available:

White Area: Calculates the area with white pixels. The masks are grouped into buckets and each row displays a range of area values. Files with completely white (1.0) or completely black (0.0) masks are listed in a separate bucket.
White Region Count: Counts the connected white regions.
Black Region Count: Counts the connected black regions.

In all modes, any pixel with a value above 0 is considered white. Use the threshold option to define a different threshold. For example, when all your masks have a background of 0.8, use a threshold of 0.8 to only count foreground pixels.

A row with red text is shown for files without mask.

If you used detection or segmentation models to generate your masks, I recommend checking the masks with low area (possibly failed detections). The region count stats are useful for checking watermark detection, for example.

File Suffix

This table groups images by existing filename suffixes. It scans the folders and tries to associate files with the loaded images. If a filename begins with the same name as the image, and lies in the same folder, it will be associated with that image.

One suffix is the file extension. It will show image formats and show the images which have associated .json or .txt files.

If masks are placed next to the image with a distict suffix (like the default -masklabel.png), it will show images with these masks too.
And selecting that row together with Negate will show images without mask.

When images exist with duplicate filenames but different extension (image.png and image.jpg for example) in the same folder, they can be found by selecting multiple extensions and combining them with List files with: Multiple.
(Such images would share the same .txt caption and mask file and should be renamed.)

Folders

This tab shows the the loaded folders in a tabular tree, along with the total and relative count of contained images. Values in parantheses are shown for parent folders which themselves contain images.

Hold the mouse cursor over the column headers to display tooltips.

This tree is useful for balancing concepts for training. An estimate for the repeats is shown in the rightmost column.
The value is calculated as: average folder size / folder size

Batch

The Batch Window provides different ways to process all the loaded files in a tab at once. The files which are processed are the same as those listed in the Gallery.

When clicking the "Start Batch" button, it will first show a confirmation where all actions are summarized. Actions that may overwrite data are shown in red.

Batch Caption

A more detailed guide for captioning can be found here: Captioning

Caption
- Generate new captions and/or tags and save them in a .json file.
- Optionally, use the prompt template to include tags for grounding which may potentially increase accuracy.
Rules
- Load existing tags from the .json file and transform them using rules.
- Use the Preset... menu at the top left to load or clear the rules.
- The Batch Rules tab has a limited interface and rules cannot be saved. Use the Caption Window to create a full preset.
  - Save the rules to a file and load them in the Batch Window Rules tab.
  - Or load the rules from the Caption Window directly.
Transform
- Send prompts to a LLM to transform existing captions.
- Use variables in the prompt template to load existing captions or tags from the .json files.
Apply
- Save entries from the .json file in a .txt file.
- Or store the entries as another key in the .json file.
- Transform the values using template functions
- Batch Apply can be used for .json key maintenance.
  - Copy entries by writing text to a different key.
  - Delete entries by writing an empty text to them.
  - Rename keys using the backup functionality:
    - Backup the old value to a key with the new name.
    - And write an empty text to the old key which should be deleted.

Batch Image Processing

Scale
- Resize images to new dimensions.
Mask
- Run macros to generate masks.
- To create macros, use the Mask Tool and record your operations.
Crop
- Run macros and use the generated mask to define crop regions.
- The size of the cropped images will match the closest entry in the list of Target Size Buckets.
File
- Copy or move files to a new destination.
- Include images, or only their captions or masks
- Backup captions in a ZIP archive
- Instead of moving/copying, you can also create symlinks in a folder which refer to the existing files.
  - This is useful for creating subsets for training. Use the Stats Window for filtering images.

Caption

The Caption Window allows manually creating and editing captions.