User Guide - FennelFetish/qapyq GitHub Wiki
Use the toolbar at the top of the Main Window to select tools and toggle windows.
- There's a burger menu (☰) in the top left corner of the Main Window:
- Shows available keyboard shortcuts (global shortcuts that work in all windows).
- Use the Model Settings to configure your models.
- Load images by drag and dropping files or folders into the Main Window.
- Hold
SHIFTwhile dropping to append files.
- Hold
- Zoom in and out with the mouse wheel.
- Pan the image by dragging with the mouse.
These controls are available in all tools, but some tools may replace them with optimized controls.
For example, the Crop Tool uses the mouse wheel for adjusting the crop region instead of zooming.
In these cases, you can hold CTRL to switch back to zooming and panning.
Reset the view (zoom and pan) with the 0 key.
To navigate between images, press the LEFT or RIGHT arrow key, or the forward/backward mouse button.
Pressing the UP and DOWN arrow keys will navigate to the next or previous folder.
When another window is focused, like when editing captions in the Gallery's list view, you can change the image with Ctrl+PageDown and Ctrl+PageUp.
In addition to the buttons in the tool bar, you can also switch between tools by pressing Ctrl+1 - Ctrl+7.
In this viewing mode you can navigate through images with the mouse wheel.
Press SPACE to start or pause the automatic image changing.
- Right click starts and freezes the measurement.
- Status bar at the bottom of the Main Window displays the values.
- Useful for estimating:
- Mask grow/shrink amount
- Blur radius/kernel size
- Area threshold for the "Filled Area" mask operation
- ...
Load a second image
- by dropping it onto the right side of the Main Window,
- or by clicking on an image in the Gallery while holding the
ALTkey.
With this tool you can select a region of the image and save that region to a new file, scaled to the selected target size.
Quick instructions:
- Load a bunch of images.
- Set the export path by clicking on the path preview in the bottom right.
- The preview shows the destination path.
- When overwriting files is enabled, the preview will turn red when the file already exists.
- When overwriting is disabled, an increasing counter (
_001) is appended to the filename.
- Set the target size at the top of the toolbar.
- Exported images will be scaled to exactly this size.
- Use the mouse to select the crop region in the image.
- Press the left mouse button twice.
- The first click fixes the selection, the second click confirms it.
- To reset the selection, left click outside of the selected region, or use the "Reset Selection" option in the right-click context menu.
- A green flash effect confirms that the export has started.
- The status bar will show a notification when the export has finished.
The toolbar on the right side of the window shows the target size at the top. This is the exact export size.
Using the list of size presets (labeled as "Pre:"), you can quickly change the target size. To edit the presets, open the list and select Setup Sizes... at the bottom.
Below that, it shows the selection size, which is the size of the currently selected region in the image. This region will be scaled to the target size, and the green or red number with the triangle shows the scaling factor.
Downscaling to a factor below 1 is recommended, because it involves less quality loss. Upscaling images results in worse quality and is therefore displayed as red.
To prevent upscaling, the selection size will have a minimum that equals your selected target size.
You can enable Allow Upscale to further shrink your selection.
When Constrain to Image is enabled, which is the default, the selection cannot exit the image boundaries.
When it is disabled and outside regions are selected, they will be exported as transparency if the format allows for that, or black otherwise.
The rotation can be adjusted in 90° steps through the shortcut buttons. Alternatively, you can adjust it in finer steps of 3° with the mouse wheel over the slider, or in finest steps of 0.1° using the mouse wheel over the spin box that displays the number.
The toolbar has a section "Export Settings" which is hidden by default. Click on that title to expand the section. There, you can select a scaling preset which defines the interpolation method used when exporting images.
The setting for "Path" can be changed to "Dialog", which will ask for the path before saving each image.
Read more about it here: Image Export
After an image was exported, you can open it in a new tab using the Open Last File button. This is useful for example if you want to create a mask for it, though, often it's faster to create all the masks together after cropping.
Use the mouse to select the crop region in the image. Click two times to export the image.
Adjust the selection size with the mouse wheel:
- Hold
SHIFTto adjust in 1px steps. - Hold
CTRLto pan and zoom instead.
After fixing the selection by clicking with the mouse once, the selection rectangle can be adjusted:
-
Arrow keys: Move the position of the selection. -
CTRL+Arrow key: Grow the selection in the direction of the arrow key. -
ALT+Arrow key: Shrink the selection from the direction of the arrow key. - Hold
SHIFTto do the above in bigger steps. - Click on the selection or press
Ctrl+Eto export the cropped image.
Pressing the right mouse button will open a context menu for quick access to size presets.
Pressing the middle mouse button (or clicking the Swap button) will swap the target size from vertical to horizontal or vice versa.
When another size preset is selected, this orientation is kept.
Right clicking onto the image or pressing Ctrl+E saves it at the defined destination.
The different scaling modes define the target size based on the original image size:
- Fixed
- Scale to exactly this size. May change aspect ratio.
- Fixed Width/Height
- Scale to a fixed width/height and calculate the other side based on aspect ratio.
- Fixed Smaller/Larger Side
- The shorter/longer side is set to the specified length. The other side is calculated based on the aspect ratio.
- Factor
- Multiply both, width and height, by this factor.
- Pixel Count
- Scale to a total number of pixels while preserving the aspect ratio.
- Quantized
- Scale both sides by a factor and make them a multiple of Q. This may change the aspect ratio.
- Closest/Wider/Taller defines in which way the size is rounded.
The toolbar and its export settings are otherwise similar to those in the Crop Tool described in the last chapter.
This tool is for creating and editing greyscale masks for the loaded image. The mask can consist of up to 4 layers, which are exported as the RGBA channels of the resulting image.
Upon loading or changing the image, it attempts to load an existing mask using the defined path template.
Changes to all the mask's layers and their history are kept in memory even when selecting another image. New or loaded masks are not stored, but editing a mask will use additional memory. Although the history is compressed, this may still consume a significant amount of RAM when many masks are edited.
Reloading the mask using the Reload button, or (re)loading a folder will clear the cached masks and history.
The Export button in the bottom right will turn red when changes are made.
To save the current layers, click this Export button or press Ctrl+E.
The following operations are provided for editing. If not specified, a left or right click into the image will apply the operation.
Basic Operations:
- Brush
- Draw brush strokes while holding left mouse button.
- Erase while holding right mouse button.
- Adjust brush size with mouse wheel. Hold
SHIFTto adjust in steps of 10. - Supports pressure-sensitive pens and graphics tablets (pressure adjusts color).
- Rectangle
- Draw a rectangle while holding left mouse button.
- Erase: Draw black rectangle while holding right mouse button.
- Flood Fill
- Fill region with the selected color (left click).
- Fill region with black (right click).
- Clear
- Reset whole layer to the selected color (left click).
- Reset to the inverse of the selected color (right click).
- Invert
- Threshold
- Pixels above the selected color are set to 1.0.
- Pixels below the selected color are set to 0.0.
- Normalize
- The range of values is adjusted to fit into the selected min and max color.
- Quantize
- Applies the selected mode to blocks of the mask and creates a coarse, pixelized grid.
- This is supposed to make drawing masks more predictable with regards to the VAE compression, but the training tool has to handle it using the right interpolation method (nearest) for scaling.
- Morphology
- Grow (dilate), Shrink (erode), Close Holes (closing), Open Gaps (opening)
- The border setting defines how pixels outside the image are interpreted.
- Reflect:
fedcba|abcdefgh|hgfedcb - Replicate:
aaaaaa|abcdefgh|hhhhhhh - Const Black / Const White treats outside pixels as 0.0 and 1.0, respectively.
- Reflect:
- Gaussian Blur
- Softens the layer
- Direction "outwards" keeps bright pixels bright.
- Direction "inwards" keeps dark pixels dark.
- Blend Layers
- Blends another layer into the currently active layer.
- Most of the modes are commutative, meaning the result is the same whether A is blended into B, or B into A.
- The exception is "Subtract", which subtracts the source layer from the active layer.
Generate:
- Detect Padding
- Detects padding or borders in the image and fills these regions with the selected color in the mask.
- The padding regions in the image need to have a similar color, e.g. black. The tolerance setting can be increased if the colors are too different.
- The min and max color settings define the range of colors in the image that can be detected as padding (the image is converted to greyscale for the detection). Use this to only detect black borders and ignore white borders, for example.
- It only works for straight padding at the top/left/right/bottom. It can't fully detect diagonal padding for rotated images.
- Centroid Rectangle
- Creates a maximally enlarged rectangle around the centroid of white pixels in the mask.
- Helps with cropping images to new aspect ratios while keeping the region of interest (use a detection model).
- The rectangle will have the defined aspect ratio. Its orientation depends on the orientation of the image.
- All pixels above the value 0 are used to calculate the geometric center.
- If the mask is completely black, the rectangle is placed at the center of the image.
- Creates a maximally enlarged rectangle around the centroid of white pixels in the mask.
Conditions:
- Condition: Color Range
- Retrieves the minimum and maximum color values from the mask layer and compares them to the specified range.
- Sets the whole layer to white if the values are inside the range.
- Sets the whole layer to black otherwise.
- Use this in macros to, for example, conditionally invert an empty black input layer to white when some images were manually masked and others were not.
- Condition: Filled Area
- Counts the non-zero pixels in the mask layer and compares the area (in percentage) to the specified range.
- Sets the whole layer to white if the area is inside the range, to black otherwise.
- Can be used in macros to discard wrong detections. Florence2 will always return a result, even when the prompted object is not present. Typically, the wrong box will have a large area.
- Use the Measure Tool to estimate the area percentage.
- Condition: Region Count
- Retrieves the number of connected white regions from the mask layer and compares it to the specified range.
- Sets the whole layer to white if the values are inside the range, to black otherwise.
- Use this together with "Detect" and "Blend Layers" in macros to, for example, conditionally skip images with multiple people. Or to only process images where the head and feet are detected.
- All pixel values above 0 will connect regions. If your background is not black, you may need to binarize your mask first using "Threshold".
- You can copy the layer with the detections to a new layer (Blend Layers - Add), do the processing there and evaluate the condition, then multiply the result back into the original layer.
User-Defined:
- Detect
- Run a model that detects boxes (e.g. face detection).
- Segment
- Run a model that segments the image with pixel-accuracy (e.g. masks foreground and removes background).
- Run Macro
- Replay a previously recorded macro.
A history of operations is kept per layer.
When the history is spun back and a new operation is applied, the following history entries are deleted.
Up to 20 steps can be undone. The maximum history length can be configured in qapyq_config.json by changing the mask_history_size setting.
Macros are sets of operations that describe how a mask is formed. They serve as the foundation for Batch Masking and Cropping.
After clicking the Start Recording button, all subsequent operations are recorded into a macro.
Undoing operations through the history will also remove them from the recording.
But note that your macro can break if you undo operations in a layer after it was blended into another layer.
As a simple rule: Don't undo operations in layers that were blended into others.
Clicking "Stop & Save" will open a file save dialog to name the macro. This file name will also be the display name.
When the save dialog is cancelled, the recording is paused and can be continued.
Macros must be saved in the qapyq/user/mask-macros/ folder (or subfolders) to be available for selection.
Adding, deleting and changing layers are also recorded. Combined with the "Blend Layers" and condition operations, this allows for powerful computations.
Be careful with the amount of expected input layers. If you start recording with more than one layer, the macro might not work properly with Batch Masking/Cropping.
It is currently not possible to record the "Run Macro" operation into a macro.
When saving images from the Crop/Scale/Mask tools, the destination path can be set in two ways:
- Path
- Use the template to define a dynamic path.
- The file extension used in the path template defines the export format.
- When overwriting is disabled, an increasing counter (e.g.
filename_001.ext) is appended to the filename. - When overwriting is enabled and the file already exists, the preview at the bottom of the toolbars will show the path in red.
- In this case, files are overwritten! It will not ask for further confirmation!
- Dialog
- Ask for the save location each time.
The default export path can be changed in the qapyq_config.json file using the path_export setting.
When using a relative path in the template, it will be based on this setting.
The following variables can be used to define a dynamic path. This info is also shown in the application when editing the path template. Functions and variables for referencing entries from the json/txt files are also available. See Templates for more information.
{{path}} Image path
{{path.ext}} Image path with extension
{{name}} Image filename
{{name.ext}} Image filename with extension
{{ext}} Extension
{{folder}} Folder of image
{{folder-1}} Parent folder 1 (or 2, 3...)
{{folder:/}} Folder hierarchy from given path to image
{{w}} Width
{{h}} Height
{{region}} Crop region number
{{date}} Date yyyymmdd
{{time}} Time hhmmss
Examples:
{{name}}_{{w}}x{{h}}.webp
{{path}}-masklabel.png
/home/user/Pictures/{{w}}x{{h}}/{{folder}}/{{name}}_{{date}}.jpg
{{name}}_{{tags.tags#replace:, :_}}.{{ext}}
The Crop and Scale Tool allow choosing a scaling preset to be used when exporting images.
In addition to defining interpolation methods for up and downscaling, the presets can define which AI model is used for upscaling (optional).
Use the Model Settings to create and edit presets.
The presets allow using different models depending on the scaling factor. For each of the three upscale levels you can define a scaling factor as a threshold. When upscaling by a factor larger than this threshold, it will use the associated model.
When the first level has a scaling factor above 1.0 (1.25 for example), it will use the defined upscaling interpolation up to that threshold.
Since the models have a fixed scaling factor, the output images are then further scaled to your selected target size. For this resizing, it will also use the selected interpolation method. This means when you want to scale an image by 1.5 and use an AI model that scales by 2.0, the model's output is downscaled using your selected downscaling interpolation.
AI scaling has not been implemented for batch cropping yet.
When downscaling images to less than half their original size, aliasing or moiré artifacts may appear when the smaller size can't resolve fine details. Anti-aliasing aims to prevent that by slightly blurring the image and thus remove details.
qapyq implements an adaptive anti-aliasing mode that only blurs aliasing-prone regions while leaving other parts sharp. It results in visibly more details, sharper textures, but it also retains more noise and is slower.
Scaling and rotating are performed in a single transformation to minimize quality loss. Interpolation is only applied once.
-
Linear,Area,CubicorLanczosare viable for downscaling when anti-aliasing is enabled.- I recommend
Cubicfor sharper results orLinear/Areato remove a bit more noise.
- I recommend
-
Lanczosis great for upscaling.
Cubic and Lanczos have a sharpenig effect and might produce ringing artifacts around high contrast regions. Lanczos more than Cubic.
When Area is selected for downscaling, adaptive anti-aliasing is disabled. However, Area is only used in the Scale Tool/Batch Scaling and with a rotation of 0. When cropping or scaling with rotation, it will always use Linear instead of Area (affine transformations in OpenCV don't work with Area).
For writing images, qapyq uses the Pillow library with an additional JXL plugin. All of Pillow's export formats can be chosen by using the respective file extension, but only the formats listed below use custom settings. For a full list of supported formats and their default settings see the Pillow Documentation.
- JPEG is a lossy format. Quality is set to maximum and chroma subsampling is disabled.
- JXL export uses lossless compression.
- PNG is a lossless format. Its compression is set to the maximum (slower saving but smaller files).
- TIFF export uses lossless tiff_lzw compression.
- WEBP export uses lossless compression.
PNG, WEBP and JXL support transparency (alpha channel). The color channels are preserved.
Altough PNG is a widely supported format, it's very slow for saving RGB images. Consider using another lossless format for saving images, if your training software allows.
Images with a color profile different from sRGB are converted to sRGB when loaded for display and inference.
When saving images, no color profile is written. This is usually interpreted as sRGB and maintains compatibility with training tools.
qapyq's interface is split into multiple windows which can be placed on multiple monitors.
These windows all depend on the currently loaded and displayed image in the Main Window. When the image changes, so does the content in the auxiliary windows.
Toggle windows by clicking the respective button in the tool bar, or by pressing F1 - F4.
The Gallery Window shows thumbnails of the currently loaded images. The images listed in the Gallery are the files that will be processed with the Batch Window.
When a thumbnail is clicked in the gallery, the image is loaded into the Main Window.
Thumbnails are generated when opening the Gallery, or when new files or folders are loaded. They are stored only in memory and are not cached to disk.
At the top of the window, the current folder is displayed and updates as you scroll through the Gallery. Clicking on it lists all loaded folders, and selecting a folder scrolls to its respective images.
The folder paths are shortened: If they have a common root at the beginning, these similar parts are hidden.
These icons can appear on the thumbnails. They show the status and availability of the respective data:
- Caption
- White if caption file exists. Red when changed. Green when changed and saved.
- Mask
- White if mask exists. Red when changed. Green when changed and saved.
- Crop State
- Green if the image was cropped and saved.
Note
The icons can sometimes be inconsistent. Don't overly rely on them.
The Gallery supports two viewing modes, which can be changed at the bottom of the window:
- Grid View
- List View with editable captions
Select the source of captions in the top right corner of the window. Then reload the texts by pressing the reload (↻) button.
Captions are automatically reloaded when changed with the Caption Window. But for performance reasons, the colored text highlighting is not always updated when editing the rules/groups. Press the reload button to manually reload the colors.
In Grid View, the captions have to be manually enabled and are shown in place of the filenames. If the text exceeds the maximum height, it is truncated and "..." will show at the bottom.
Activate the "Filter" checkbox to process the captions with the current settings from the Caption Window (if it's open): Only tags from visible groups are shown, all other tags are hidden. Use the filter text field in the Caption Window's Groups tab to hide groups and restrict the displayed tags.
Activating the filter will also apply the current rules, but without adding the prefix and suffix.
This filter is helpful when editing captions for a certain aspect. The Gallery then provides an overview of existing tags and shows where tags are missing.
In List View, captions are always enabled and can be edited in-place. The text fields support navigation between tags with Alt+Arrow keys. Pressing Ctrl+S or the Save button will save the text to the selected destination (as defined in the top right corner of the window).
You can use Ctrl+PageDown and Ctrl+PageUp to navigate to the next/previous caption.
The images in the Gallery are divided into a section for each folder. Each folder header shows its name at the left, and the number of images at the far right side.
Clicking on the image count or menu icon inside the folder header opens a menu with actions for the folder's images:
- Select images
- Open Files in New Tab
- Unload Files
The folder headers can be disabled by activating "Sort" and unchecking "By Folder".
Multiple images in the Gallery can be selected. Right-clicking on one of the selected images will open a menu:
- Clear Selection
- Open Selected Files in New Tab
- Unload Selected Files
Selected images are shown with a dashed border. The active image, which is displayed in the Main Window, has a solid and thicker border.
The active image is always part of the selection, and the Gallery's status bar shows the number of selected images.
While multiple images are selected, all methods for navigating between images (arrow keys, auto skip, slideshow, etc.) will only move to selected images.
Most notably, when selecting multiple images, the Caption Window will automatically switch to Multi-Edit mode where the combined captions of the selected images can be edited simultaneously. In this mode, the Gallery will additionally highlight images which contain the selected tag from the Caption Window.
See Multi-Edit Mode for more information.
To select images, these options are available:
- Hold the
left mouse buttonand drag the mouse over multiple images in the Gallery. - Hold
CTRLand click on an image to toggle its selection. - Hold
SHIFTand click on an image to select a range of files, beginning at the active image.
For deselecting images:
- Hold
CTRL+left mouse buttonand drag the mouse over multiple images. - Hold
CTRLand click on an image to toggle its selection. - Hold
CTRL+SHIFTand click on an image to deselect a range of files, beginning at the active image, but excluding it.
To clear the selection:
- Double-click on an image.
- Use the "Clear Selection" option in the right-click context menu.
- Select an image that is not part of the selection.
The active image can be changed to any other selected image without clearing the selection.
Clicking on an image while holding the ALT key will open the Compare Tool and load the image to the right side.
The thumbnails in the Gallery can be sorted by the image contents and their similarity to a prompt. This can help with finding similar images that need similar captions. Then, you can select them all and use Multi-Edit to add a tag to all captions. It's also helpful for filtering images during acquisition.
To enable semantic sorting, you first have to download and setup an embedding model. The chapter Embedding Settings explains the processing settings.
When clicking the Sort toggle button in the bottom right corner in the Gallery Window, another panel appears with options for semantic sorting:
- Embedding model selection
- Toggle button for ascending order (
∇) - Positive and negative prompt
- Checkbox for disabling folder-grouping
Images that match the positive prompt are sorted to the top. Use the negative prompt for aspects which you don't want at the top.
Every word affects the meaning of the prompt. Be specific with aspects that you're actually looking for. But use less specific, neutral terms for aspects which only add context but shouldn't influence the meaning. For example, use "person" instead of woman/man, if the gender is irrelevant and you only want to sort by perspective.
You can sometimes improve the results with a neutral negative prompt. For example:
- Positive:
a person seen from behind - Negative:
a person
You can combine multiple positive or multiple negative prompts by using the | character as the separator in the text:
a person seen from behind|a sitting person|indoors
Press Enter after writing a prompt to update the sorting. When images are sorted for the first time, it first needs to create the embeddings. You can speed up this process by using multiple hosts, even when running locally. It will take more VRAM, but the embedding models aren't very large. See the Remote Inference chapter for how to setup hosts.
When right-clicking on a thumbnail in the Gallery, you can choose to sort all images by their similarity to the selected file(s).
The images are grouped into folders by default. You can uncheck the By Folders checkbox to disable the folders and sort all images instead.
Unchecking it also works without prompt and the images will be shown next to each other without folder headers.
The Stats Window provides summaries of your loaded images in sortable tables. After loading the data, you can select rows in the table and the associated images will be listed on the right. In the case of tags for example, each row is a tag, and the listed files are the images with captions containing that tag.
Rows can be filtered using the text box to the left. It supports regex.
To filter for multiple words at once, write it like this: tag1|tag2 (no spaces)
Hold CTRL while clicking to select multiple rows. Above the list of files you can choose how to combine the selected rows:
- Any (Union)
- Lists images which are associated with at least one selected row.
- One
- Only lists images which are associated with exactly one row.
- Multiple
- Only lists images which are associated with more than one row.
- All (Intersection)
- Only lists images which are associated with all selected rows.
The Negate checkbox will invert the list and display all images which would otherwise be hidden if the checkbox was unchecked.
In the case of tags for example, selecting one tag and negating the list will show all images WITHOUT the tag.
The With Files... button shows actions for the listed images. Most notably, you can open them in a new image tab in the Main Window.
The new tab will only contain the filtered images. Batch processing, the gallery, and also the Stats Window will only handle these filtered images.
This also allows you to chain filters and apply a further restricted selection in the new tab.
Note that each tab has its own state for all the windows and tools.
Use the selector at the top to change the data source. The captions are split by all defined separators (use \n to split lines).
Additionally, combined tags can be split, so for example black denim pants would be listed as two rows: black pants and denim pants. This is useful if you have manually edited captions or some with rules already applied.
The groups in the Caption Window define which tags are split, and it will only work if the tags fully consist of words which are part of the group. If extra words are present, the tag is listed as-is, without splitting.
This table will show all tags with their total count. It uses the colors of the rules and groups in the Caption Window.
Right-clicking on a row will open the context menu from where you can add the tag to the caption in the Caption Window, to groups, focus or bans. The menu also provides shortcuts for Batch Processing, to add/remove/replace tags in all loaded files.
This table will show the names of all existing keys found in .json files.
This is useful together with Negate to find images which have no caption for example.
This table groups the images by their size.
This is useful for estimating the size buckets for training, and to find buckets which lack images to fill a batch.
It can also be used for filtering out low-res images during image acquisition.
Masks are loaded using the defined path template. Multiple modes for the stats are available:
- White Area: Calculates the area with white pixels. The masks are grouped into buckets and each row displays a range of area values. Files with completely white (1.0) or completely black (0.0) masks are listed in a separate bucket.
- White Region Count: Counts the connected white regions.
- Black Region Count: Counts the connected black regions.
In all modes, any pixel with a value above 0 is considered white. Use the threshold option to define a different threshold. For example, when all your masks have a background of 0.8, use a threshold of 0.8 to only count foreground pixels.
A row with red text is shown for files without mask.
If you used detection or segmentation models to generate your masks, I recommend checking the masks with low area (possibly failed detections). The region count stats are useful for checking watermark detection, for example.
This table groups images by existing filename suffixes. It scans the folders and tries to associate files with the loaded images. If a filename begins with the same name as the image, and lies in the same folder, it will be associated with that image.
One suffix is the file extension. It will show image formats and show the images which have associated .json or .txt files.
If masks are placed next to the image with a distict suffix (like the default -masklabel.png), it will show images with these masks too.
And selecting that row together with Negate will show images without mask.
When images exist with duplicate filenames but different extension (image.png and image.jpg for example) in the same folder, they can be found by selecting multiple extensions and combining them with List files with: Multiple.
(Such images would share the same .txt caption and mask file and should be renamed.)
This tab shows the the loaded folders in a tabular tree, along with the total and relative count of contained images. Values in parantheses are shown for parent folders which themselves contain images.
Hold the mouse cursor over the column headers to display tooltips.
This tree is useful for balancing concepts for training. An estimate for the repeats is shown in the rightmost column.
The value is calculated as: average folder size / folder size
The Batch Window provides different ways to process all the loaded files in a tab at once. The files which are processed are the same as those listed in the Gallery.
When clicking the "Start Batch" button, it will first show a confirmation where all actions are summarized. Actions that may overwrite data are shown in red.
A more detailed guide for captioning can be found here: Captioning
- Caption
- Generate new captions and/or tags and save them in a
.jsonfile. - Optionally, use the prompt template to include tags for grounding which may potentially increase accuracy.
- Generate new captions and/or tags and save them in a
- Rules
- Load existing tags from the
.jsonfile and transform them using rules. - Use the
Preset...menu at the top left to load or clear the rules. - The Batch Rules tab has a limited interface and rules cannot be saved. Use the Caption Window to create a full preset.
- Save the rules to a file and load them in the Batch Window Rules tab.
- Or load the rules from the Caption Window directly.
- Load existing tags from the
- Transform
- Send prompts to a LLM to transform existing captions.
- Use variables in the prompt template to load existing captions or tags from the
.jsonfiles.
- Apply
- Save entries from the
.jsonfile in a.txtfile. - Or store the entries as another key in the
.jsonfile. - Transform the values using template functions
- Batch Apply can be used for
.jsonkey maintenance.- Copy entries by writing text to a different key.
- Delete entries by writing an empty text to them.
- Rename keys using the backup functionality:
- Backup the old value to a key with the new name.
- And write an empty text to the old key which should be deleted.
- Save entries from the
- Scale
- Resize images to new dimensions.
- Mask
- Run macros to generate masks.
- To create macros, use the Mask Tool and record your operations.
- Crop
- Run macros and use the generated mask to define crop regions.
- The size of the cropped images will match the closest entry in the list of Target Size Buckets.
- File
- Copy or move files to a new destination.
- Include images, or only their captions or masks
- Backup captions in a ZIP archive
- Instead of moving/copying, you can also create symlinks in a folder which refer to the existing files.
- This is useful for creating subsets for training. Use the Stats Window for filtering images.
The Caption Window allows manually creating and editing captions.
A more detailed guide for captioning can be found here: Captioning