TRAIN output_options - WHOIGit/ifcb_classifier GitHub Wiki
Output Options
As a result of training, several review files are created in addition to the trained model file. The following options allow a user change the default output settings pf model training.
Many of the options below allow for formatting tags, whereby certain keywords in curly braces get replaced by their corresponding variable value. A common one for example is {TRAINING_ID}, where the training id specified by the 3rd neuston_net TRAIN positional argument (neuston_net.py TRAIN SRC MODEL TRAINING_ID) replaces the variable name and curly braces at runtime.
Output Options:
--outdir OUTDIR Default is "training-output/{TRAINING_ID}"
--model-id ID Set a specific model id. Patterns {TRAIN_DATE} and {TRAIN_ID} are recognized. Default is "{TRAIN_ID}"
--epochs-log ELOG Specify a csv filename. Includes epoch, loss, validation loss, and f1 scores.
Default is "epochs.csv".
--args-log ALOG Specify a human-readable yml output filename containing all user-specified
and default training parameters. Default is "args.yml".
--onnx Additionally output an onnx version of the model.
--results FNAME [SERIES ...]
FNAME: Specify a validation-results filename or pattern.
Valid patterns are: "{epoch}". Accepts .json .h5 and .mat file formats.
SERIES: Data to include in FNAME file.
The following are always included and need not be specified:
model_id, timestamp, class_labels, input_classes, output_classes.
Options are: image_basenames image_fullpaths
output_scores output_winscores
confusion_matrix
classes_by_{count|f1|recall|precision}
{f1|recall|precision}_{macro|weighted|perclass}
{counts|val_counts|train_counts}_perclass
--results may be specified multiple times in order to create different files.
If not invoked, the default options are:
FNAME = results.mat
SERIES = image_basenames output_scores counts_perclass
confusion_matrix f1_perclass f1_weighted f1_macro
Output Directory (--outdir)
OUTDIR specifies the output directory for all media produced by a neuston_net TRAIN. It defaults to "training-output/{TRAINING_ID}". If the directory path is not an absolute path, the directory path is considered to be relative to the current working directory neuston_net.py was invoked from. If the directory specified doesn't exist, it will be created.
Available formatting tags are:
{TRAIN_DATE}- the date training was initiated in "YYYY-MM-DD" format.{TRAINING_ID}- the training id specified by the 3rd neuston_net TRAIN positional argument (neuston_net.py TRAIN SRC MODEL TRAINING_ID)
Model ID (--model-id)
The filename as well as the internally stored name of the output model. Model ID can further be used as a formatting tag during neuston_net RUN.
By default this is "{TRAIN_DATE}__{TRAINING_ID}". This will result in a model file with a filename such as 2021-12-21__SomeParticularTrainingName.ptl
Available formatting tags are:
{TRAIN_DATE}- the date training was initiated in "YYYY-MM-DD" format.{TRAINING_ID}- the training id specified by the 3rd neuston_net TRAIN positional argument (neuston_net.py TRAIN SRC MODEL TRAINING_ID)
ONNX (--onnx)
By including this flag, an .onnx version of the model will be created alongside the .ptl model. ONNX is a widely-supported open-source format for AI/ML model inferencing.
Logs (--args-log and --epochs-log)
args.yml- The Args log is a yaml file that captures the input parameters of a training run. The filename can be changed using the--args-logflag.epochs.csv- The Epochs log is a csv file that captures the performance (loss for training and validation datasets) of a model across its training epochs. The filename can be changed using the--epochs-logflag.
Results File (--results)
Besides the model and the log files, one (or more) results file is produced by neuston_net TRAIN. A results file features the classification output of the model's best-epoch against the Validation dataset, as well as many other possible summary statistics. The output flag follows the following pattern:
--results FNAME [SERIES ...]
Where FNAME is result file's filename, and SERIES is a list of possible values and statistics to include.
FNAME Output Format
FNAME recognizes .mat, .h5 and .json as viable output formats (matlab, hdf5, and json respectively). Additionally, {epoch} is recognized as a valid formatting tag; this allows for the results of multiple epochs to later be compared.
Automatic SERIES
These Series are always included in a results file
model_id- the MODEL-ID of the trained modeltimestamp- the timestamp the model training was initiatedclass_labels- the list ofmclass labels. The order of this list is consistent with all further per-class list orderingsinput_classes- the list ofntrue input classes, as integer indices corresponding to its respectiveclass_labelslabel- output_classes - the list of
npredicted output classes, also as integer indices
Optional SERIES that may be specified
These Series must be specified in order to be included in a results file when --results is manually invoked.
image_basenames- list ofnimage filenames as determined bySRCand runtime dataset configurationsimage_fullpaths- as above, except the images' fullpaths are usedoutput_scores- Ann-by-mmatrix of scores, wherenis number of inputs/images, andmis the number of classes. Values for a given image are softmaxed such that their sum equals 1; higher values indicate a higher classification confidence score from the model.output_winscores- A list of the highest classification confidence score for each image.confusion_matrix- Anm-by-mmatrix comparing image true labels vs. predicted labels.classes_by_{count|f1|recall|precision}... - list of classes as ordered by count, f1 score, recall score, or precision score.classes_by_countclasses_by_f1classes_by_recallclasses_by_precision
{f1|recall|precision}_{macro|weighted|perclass}- f1, recall, or precision score by macro averaging, weighted averaging, or perclass ("samples" averaging mode in sklearn). See sklearn.metrics.f1_score for more details.f1_macrof1_weightedf1_perclassrecall_macrorecall_weightedrecall_perclassprecision_macroprecision_weightedprecision_perclass
{counts|val_counts|train_counts}_perclass- list of counts (total, or for the training or validation dataset) on a per-class basis.counts_perclassval_counts_perclasstrain_counts_perclass
Creating multiple result files
It is possible to invoke the --results flag multiple times, each time allowing the user to specify a new results FNAME and set of SERIES. This may be used to output multiple different file formats at the same time, and-or to specify a different set of series for different files.
Example
The flags --results short_results.json --results full_results.h5 image_fullpaths output_scores would result in two files being generated.
short_results.json- only contains model_id, timestamp, class_labels, input_classes, output_classes (the always-included series)full_results.h5- would additionally contain the largen-by-mmatrix of scores and then-length list of image fullpaths.
Default behavior
If --results is not specified at runtime, by default neuston_net TRAIN acts as though it were specified as --results results.mat image_basenames output_scores counts_perclass confusion_matrix f1_perclass f1_weighted f1_macro.
Other Files
These files do no have manual/flaggable options.
training_images.list- list of training imagesvalidation_images.list- list of validation imageschkpts/- a directory of best-epoch checkpoints. The checkpoint files are interoperable with the final .ptl file.logs/default/version_0- contains runtime versions ofargs.ymlandepochs.csv(namedhparams.yamlandmetrics.csvrespectively here). Upon completion of a Training, these two files are copied and renamed into the mainOUTDIRdirectory. If a training with the sameOUTDIRdirectory is started, the "version" number is incremented and a new version directory is created. Other files in theOUTDIRdirectory will be overwritten.