UTIL CLASS_CONFIG - WHOIGit/ifcb_classifier GitHub Wiki

Class Config CSV

A Class Config CSV allows for class names to be combined, renamed, and excluded at model training runtime. This is advantageous to a user who may wish to combine or exclude certain classes without having to manually maintain multiple iterations of a baseline dataset. Class configurations saves a user from time-consuming, diskspace-consuming, error-prone copy/rename/delete processes related to training with variations on dataset.

A Class Config CSV is used by specifying the CSV filename followed-by one of its configuration names after the --class-config flag of neuston_net.py TRAIN. Example usage will be shown below.

Consider the following example Dataset Directories D1, and the Class Config CSV D1_classconfig.csv.

path/to/
└─ D1/
   ├─ amoeba/
   ├─ Diatom_sp1/
   ├─ Diatom_sp2/
   ├─ unknown1/
   └─ bad/
path/to/D1, CONFIG1, CONFIG2
    amoeba,  Amoeba,       1
Diatom_sp1,       1,  diatom
Diatom_sp2,       1,  diatom
  unknown1,       1,       0
       bad,       0,       0

Example Usage:

neuston_net.py TRAIN path/to/D1 inception_v3 YourTrainingID --class-config D1_classconfig.csv CONFIG2

Note the following properties common to all Dataset Config CSVs:

  • The baseline dataset is in the first cell
  • The first column is a list of all available classes in the baseline dataset
    • Class names are case sensitive and duplicate names are invalid
    • The very first cell contains the baseline dataset, though it may be left blank without consequence
  • Subsequent columns each represent a particular class configuration
    • Each configuration column header must be uniquely named. This is the Configuration Name
    • There may be any number of configuration columns
  • A 0 in a configuration column cell indicates that the corresponding class should be excluded for that configuration
  • A 1 in a configuration column cell indicates that that class should be included for that configuration
  • Text in a configuration column cell renames and includes the class, possibly allowing it to be combined with another class available in the baseline dataset.

The example Class Config CSV above has two configurations:

  • CONFIG1
    • the class bad is excluded
    • the class amoeba is renamed to be capitalized for aesthetic purposes
    • A model trained with this configuration would have four output classes: Amoeba, Diatom_sp1, Diatom_sp2, unknown1
  • CONFIG2
    • Diatom_sp1 and Diatom_sp2 are combined into a single diatom class
    • bad and unknown1 are excluded
    • A model trained with this configuration would have only two output classes: amoeba and diatom.

In the example above, CONFIG2 is being used which will result in a model with only two output classes. The --class-config flag only accept two positional arguments, the class configuration csv and one configuration name from that csv. See Dataset Params or the following excerpt.

  --class-config CSV COL  Skip and combine classes as defined by column COL of a CSV configuration file.

Making a baseline Class Config CSV using neuston_util.py

A baseline class config csv for a given dataset can be generated using neuston_util.py MAKE_CLASS_CONFIG. The command automatically generates the first-column of all available classes available in the dataset, and a second column with a generic all-inclusive configuration of all-1's. It is up to the user to further edit the csv in order to rename/combine classes, exclude classes, edit the column header to some significant configuration name, or create additional (uniquely-named) configuration columns.

Example Usage

neuston_util.py MAKE_CLASS_CONFIG path/to/D1 -o D1_classconfig.csv

usage: neuston_util.py MAKE_CLASS_CONFIG [-h] [-o OUTFILE] PATH

positional arguments:
  PATH                  path to a dataset directory or dataset configuration csv file.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTFILE            Specify an output file. If unset, outputs to stdout.

Misc.

A Dataset Configuration CSV and Class Config CSV may be used in conjunction. First, the Dataset Configuration CSV should be generated and edited as desired; it represents a novel Dataset. Then, use the Dataset Configuration CSV (instead of a Dataset Directory) to generate the baseline Class Config CSV.

eg: neuston_util.py MAKE_CLASS_CONFIG path/to/D1D2_config.csv -o D1D2_classconfig.csv

From there, edit Class Config CSV as usual. The final training command would look like as follows:

eg: neuston_net.py TRAIN path/to/D1D2_config.csv inception_v3 SomeTrainingID --class-config D1D2_classconfig.csv CONFIG1