Database Creation - HaW-Tagger/HWtagger GitHub Wiki

Pasted image 20241123001910.png

Typical workflow

Database Creation

  • Input database path
  • Specify settings
  • Create database
  • Save database

Adding new images

  • Input database path
  • Specify settings
  • Add new images
  • Save database

Changing database tagger

  • Input database path
  • Specify settings
  • Re-Apply to Database
  • Save database

Database Folder

Enter your database path, this folder should contains the desired images

  • The tagger can only use .jpg/.jpeg/.png images
  • The tagger can access all subfolders inside the directory
  • The tagger will ignore all other file type

Database Settings

External Tags

  • Online tags: check between the original MD5 of the image and the corresponding databases and will retrieve them if possible.
    • Gelbooru and rule34.xxx especially can have messed up tags, so you can hard filter these tags to be only tags that exist in the danbooru tags snapshot (currently from summer 2024).
  • Tags from files: search for local text files with the same name as the images and will add them as a source for the tags
    • retrieved before images are renamed to md5, or moved, or converted
  • Captions from files: search for local text files with the same name as the images and will add them as the sentence
    • retrieved before images are renamed to md5, or moved, or converted

Automatic Tagging

Automatic taggers currently available are:

Note: You can ask for us to add for a new model support anytime, but if we can't for various reasons, it won't be possible. We had difficulty implementing Florence2 to the tagger, and gave up for now, If you have suggestions for good/lightweight captioning tools we would find it useful.

Object Detection

We have YOLO detection implemented, but the entire logic behind the implementation is not entirely finished, so it detects properly and you can see the rectangles in the viewer, but this is useless for now.

Post Tagging Process

  • MD5 renaming: it is useful to make easier data transfer when adding new images.
  • PNG conversion: if you edit your images, it will help you by removing the addition of compression artefacts. Compression artefacts are bad for training, try to limit it.
  • Duplicate images: found duplicate images will be moved to a dedicated "DUPLICATES" folder, in case of identical MD5
  • Create groups following directory structure: automatically add images to a group