dataset_captioning_standards - OpenDiffusionAI/wiki GitHub Wiki

Work in Progress

Note: This document is currently not yet an official standard, but a work in progress

Manual(?) captioning standards

Ideally, we would want all of these things in ALL our human based dataset images. "Ideally" for AI captioned ones as well, but 100% for all manually captioned images.

Things mentioned here should be things of interest to all human based photos They should also adhere as much as possible to existing standards of naming, rather than inventing our own, when possible.

With that in mind, try using the danbooru site’s autocomplete search to standardize wording. For example, “facing “ completes to “viewer”, not camera

shot type: (medium,full,headshot, etc, etc. we need full list here)
physical identity: ethnicity, male/female/“nonbimary person”. Eg: “Asian male”. white=> caucasian. black => african
makeup: “bareface”(or should we use “no makeup”?), “evening makeup”(?), ...
facing: left, right, up, down, away from viewer, facing viewer
looking: (same words as facing but specifically about the eyes)
pose(s): (be creative here. Need lots of potentially overlapping items)
hair: length type color eg: “long wavy brown hair”, “short curly blue hair”
clothes: (could be a rabbit hole, but at least give basic colors?) If there is a clear fashion style, mention it. If their clothes indicate an occupation, mention it.
?? what else ??

Fixed definitions

The following caption categories are proposed to have fixed, approved-values-only:

gender
ethnicity
camera shot
looking 
facing
hair color/length/style
 (this is important, because along with fixed words to use,
  we need to actually DEFINE what constitutes "long", "short", etc.
  Otherwise there will be disagreement)

Examples

We need a sample image with proper captions here.