dataset_captioning_standards - OpenDiffusionAI/wiki GitHub Wiki
Work in Progress
Note: This document is currently not yet an official standard, but a work in progress
Manual(?) captioning standards
Ideally, we would want all of these things in ALL our human based dataset images. "Ideally" for AI captioned ones as well, but 100% for all manually captioned images.
Things mentioned here should be things of interest to all human based photos They should also adhere as much as possible to existing standards of naming, rather than inventing our own, when possible.
With that in mind, try using the danbooru site’s autocomplete search to standardize wording. For example, “facing “ completes to “viewer”, not camera
-
shot type: (medium,full,headshot, etc, etc. we need full list here)
-
physical identity: ethnicity, male/female/“nonbimary person”. Eg: “Asian male”. white=> caucasian. black => african
-
makeup: “bareface”(or should we use “no makeup”?), “evening makeup”(?), ...
-
facing: left, right, up, down, away from viewer, facing viewer
-
looking: (same words as facing but specifically about the eyes)
-
pose(s): (be creative here. Need lots of potentially overlapping items)
-
hair: length type color eg: “long wavy brown hair”, “short curly blue hair”
-
clothes: (could be a rabbit hole, but at least give basic colors?) If there is a clear fashion style, mention it. If their clothes indicate an occupation, mention it.
-
?? what else ??
Fixed definitions
The following caption categories are proposed to have fixed, approved-values-only:
gender
ethnicity
camera shot
looking
facing
hair color/length/style
(this is important, because along with fixed words to use,
we need to actually DEFINE what constitutes "long", "short", etc.
Otherwise there will be disagreement)
Examples
We need a sample image with proper captions here.