neuston_net TRAIN - WHOIGit/ifcb_classifier GitHub Wiki

python neuston_net.py TRAIN path/to/DATASET BASE_MODEL TRAIN_ID
usage: neuston_net.py TRAIN [-h] [--model-id MODEL_ID] [--img-norm MEAN STD] [--seed SEED] [--split T:V] 
                            [--untrain] [--class-min MIN] [--emax MAX] [--emin MIN] [--estop STOP] 
                            [--class-config CSV COL] [--flip {x,y,xy,x+V,y+V,xy+V}]
                            [--outdir OUTDIR]  [--epochs-log EPOCHS_LOG] [--args-log ARGS_LOG]
                            [--results FNAME [SERIES ...]]
                            SRC MODEL TRAINING_ID

positional arguments:
  SRC                   Directory with class-labeled subfolders. May also be a dataset-configuration csv.
  MODEL                 Select a base model. Eg: "inception_v3"
  TRAIN_ID              Training ID. This value is the default value used by --outdir and --model-id.

optional arguments:
  -h, --help            show this help message and exit

Model Adjustments:
  --untrain             If set, initializes MODEL ~without~ pretrained neurons. 
                        Default (unset) is to start with a model pretained on imagenet.
  --img-norm MEAN STD   Normalize images by MEAN and STD. 
                        eg1: "0.667 0.161", eg2: "0.056,0.058,0.051 0.067,0.071,0.057"

Dataset Adjustments:
  --seed SEED             Set a specific seed for deterministic output & dataset-splitting reproducability.
  --split T:V             Ratio of images per-class to split randomly into Training and Validation datasets.
                          Randomness affected by SEED. Default is "80:20"
  --class-config CSV COL  Skip and combine classes as defined by column COL of a CSV configuration file.
  --class-min MIN         Exclude classes with fewer than MIN instances. Default is 2.
  --class-max MAX         Limit classes to a MAX number of instances. 
                          If multiple datasets are specified with a dataset-configuration csv, 
                          classes from lower-priority datasets are truncated first. 

Epoch Parameters:
  --emax MAX            Maximum number of training epochs. Default is 60.
  --emin MIN            Minimum number of training epochs. Default is 10.
  --estop STOP          Number of epochs following a best-epoch after-which to stop training. 
                        AKA Early Stopping. Set STOP=0 to disable. Default is 10.

Augmentation Options:
  Data Augmentation is a technique by which training results may improved by simulating novel input

  --flip {x,y,xy,x+V,y+V,xy+V}
                        Training images have 50% chance of being flipped along the designated axis:
                        (x) vertically, (y) horizontally, (xy) either/both. 
                        May optionally specify "+V" to include Validation dataset

Output Options:
  --outdir OUTDIR       Default is "training-output/{TRAINING_ID}"
  --model-id ID         Default is "{date}__{TRAINING_ID}"
  --epochs-log ELOG     Specify a csv filename. Includes epoch, loss, validation loss, and f1 scores.
                        Default is "epochs.csv".
  --args-log ALOG       Specify a human-readable yml output filename containing all user-specified 
                        and default training parameters. Default is "args.yml".
  --results FNAME [SERIES ...]
                        FNAME: Specify a validation-results filename or pattern. 
                               Valid patterns are: "{epoch}". Accepts .json .h5 and .mat file formats.
                        SERIES: Data to include in FNAME file. 
                                The following are always included and need not be specified: 
                                    model_id, timestamp, class_labels, input_classes, output_classes.
                                Options are: image_basenames image_fullpaths
                                             output_scores output_winscores 
                                             confusion_matrix
                                             classes_by_{count|f1|recall|precision}
                                             {f1|recall|precision}_{macro|weighted|perclass} 
                                             {counts|val_counts|train_counts}_perclass
                        --results may be specified multiple times in order to create different files. 
                        If not invoked, the default options are:
                            FNAME = results.mat
                            SERIES = image_basenames output_scores counts_perclass 
                                     confusion_matrix f1_perclass f1_weighted f1_macro