Overview - TobiasSchmidtDE/DeepL-MedicalImaging GitHub Wiki

Overview

The repository is mainly structured into six parts:

root: devops code
data: training data
logs: training logs
models: trained models
src: training code

DevOps

See our Getting Started Guide on how to setup a clean version of our pipeline/repository, including the dataset, the .env environment file as well as the docker container.

Data

After executing download_dataset.sh -a script the repository will have a data folder containing a subfolder for chexpert and chestxray14. See our "Data-Set-&-Preprocessing-Methods" section for more details on the chracteristics of the datasets. Each dataset comes in different versions. Chexpert will be available in a "full" and "dev" version, both providing the downscaled versions of the images (sizes ranging between 320x390 to 320x320). However the "dev" dataset only contains a training set of about 15k samples instead of 220k. For the chestxray14 dataset there are two different versions corresponding to the images sizes: 256 and 512.

Logs

Contains our logs right after the training:

experiment-log.json: The id, name, benchmark, train/val/test results including all metrics calculated while training / directly after training. This file contains all experiments/models which have also been pushed to the wiki, therefore have a visualization of the logs
unvalidated-experiment-log.json: Same as above, but this file contains all the experiments/models which haven't been pushed to the wiki yet.

Important note: Because we had a bug in the calculation of our metrics since we tried to add support for NaN labels (masked loss function) we had to reevaluate all our models. Therefore we reevaluated all models that were trained after moving to the Garching server and added the following log files:

reevaluation_experiments.json: Same as the previos log files, but for each experiment with additional attributes/dictionaries "test_again" and "val_again" that contain the correct values for all metrics.
epoch_reevaluation_experiments.json: Same as above, but with each experiment reevaluated on the models that were save after every finished epoch. Attribute epoch_model defines from which epoch the model weights were used to reevaluate the experiment.

Models

The models folder is where the results from each training will be saved. Including the tensorboard logs as well as model weights and checkpoints. Additionally all final model weights have been uploaded to an GCP instance. The IDs of these save files are referenced in the experiment-logs.json and similar log files.

Both on the Klinik and Garching server this model folder is not empty. The most recent models have been trained on the Garching Server and can only be found there.

Each subfolder in the model folder represents one experiment and therefor named after it. It is usually a combination of the model architecture name and the benchmark name.

Model architecures are:

DenseNet121 / DenseNet 169
ResNet152V2
InceptionV3
InceptionResNetV2
Xception

The benchmark name follows the form: [Dataset]_[Loss]_[ClassWeight]_[Epochs]_[Batchsize]_[Cropping]_[Classes]_[Augmentations]_[Transformations]_[Uncertainty Encoding]_[ImageDimension]_[Datasetsplit]_[LearningRate]_[LearningFactor]_[Optimizer]_[Upsampling]

Dataset: The name of the dataset
Loss: Abbreviation of the loss function, e.g. BCE, WBCE, CWBCE
ClassWeight: Whether the class weights have been regularized or not. Either "L1Normed" or ommitted
Epocs: Number of Epochs, e.g. "E3" for three epochs
Batchsize: Batch size, e.h. "B32" for batch size 32
Cropping: Wether cropping was used or not. Either "C0" if not used or "C1" if used.
Classes: The number of classes for which the model was trained. E.g. "N12" for model trained on 12 classes.
Augmentations: The image augmentations used. If ommited none were used. Example value: "AugAffine" or "AugColor"
Transformations: Name and parameter of additional image transformations: For example sharpening: "sharp21"
Uncertainty Encoding: If ommit encoding is uzeros. Otherwise it can be "Uzero"/"Uone" or "Uxy" where x is number of uzero encoded classes and y number of uones encoded classes. E.g. "U75"
ImageDimension: The dimension of the images the network is trained on. E.g. "D256" for images of dimension 256x256
Datasetsplit: If ommitted, the old split is used, where only 60% of the data is used for training. Otherweise it specifies the percantage of the train/val split. E.g. "D9505" means 95% of the data is traindata and 5% validation. Test is then taken from chexpert test set.
Learning Rate: The learning rate used to initialize the optimizer in the form "xLRy" which corresponds to xe-y. E.g. "1LR4" is learning rate 1e-4
Learning Factor: The factor for learning rate decay. "LFx" means factor 0.x. E.h. "LF5" corresponds to factor 0.5
Optimizer: Name of the optimizer. E.g. "Adam" or "SGD". If ommited Adam was used.
Upsampled: Training used upsampled dataset if "_upsampled". If omitted no upsampling was used.

Code

Experiments & Benchmarks

Our code base is mainly structure around the benchmark and the experiment class. A benchmark defines the dataset that is used as well as all hyperparameters & training specifications. See the above description for a high level summary or the documentation in the class definition. An experiment is an instantiation of a benchmark together with a model and responsible for training, evaluation and logging/saving the results. The class definitions for both can be found in src/architectures/benchmarks.py.

Architectures

To be able to use any one of the common CNN architecture already defined in Keras we defined the SimpleBaseArchitecture function to create an architecture with DenseNet (or any other) as backbone for a classification task model. Those are by default initialized with pretrained imagenet weights. This can be found in src\architectures\simple\simple_base.py

To achive state-of-the-art performance we tried different ensembling methods: Stacking models together with an additional meta learner to learn the best combination of the model outputs and a simple weighted averaging ensemble. These are defined in the src\architectures\adv\ensemble_existing notebook.

We also provide a set of functions to reload and reevaluate experiments that we have already exectued. The code base for this can be found in src/architectures/simple/load_model together with the reevlaution notebook which shows how these functions can be used.

Training Scripts

For executing our training we either used the experiment_exec notebook for quick iterations on our training pipeline or the experiment_exec.py script which was intended to be used for batch like execution of multiple experiments. Both can be found in src/architectures/simple. To allow for easier definition of batches of experiments we also created the benchmark_definitions.py (in src\architectures\benchmarks) that provides tools for instantiating a set of different benchmarks.

Dataset Handling

How we load the dataset is specified by our ImageDataGenerator class. This class is responsible for loading the images from disk, transforming the images (rescaling, cropping, etc.) as well as applying augmentations and preprocessing the label.csv file. This mainly includes the uncertainty encodings and also covers how we handle NaN values or upsample the dataset. The class definition can be found in src/datasets/generator.py.

An example on how to use the data generator class can be found in the same folder in the distribution_generator notebook. The folder also contains our dataset exploration notebooks and a notebook for analyzing and visualizing the class distribution of a given ImageDataGenerator.

The src/datasets also contains our data_augmentations.py where we defined all image augmentations that we tried out.

Metrics and Loss Functions

The src\metrics folder contains all our custom code for metrics and loss functions. Specifically it defines the F2-Score, a SingleClassMetric wrapper that can take any keras metric and apply it to only one class, and a NaNWrapper class that was supposed to be used to mask any NaN values when calculating a metric. The loss.py defines our Custom Weighted Binary Cross Entropy (CWBCE) loss function as well as the compute_class_weight function that is used to calculate the class weights based on the class distribution in a given data generator.

Additionally we also provide a notebook for plotting ROC curves and finding the optimal threshold in the plot_ROC notebook.

Preprocessing

The src/preprocessing folder contains our code for cropping and normalizing/transforming images as well as our custom code for splitting the dataset. To preprocess the complete dataset before training the preprocess.py can be used. When using the same preprocessing for multiple experiments this will speed up the training.

Utils

In our src/utils section we defined how to store models in the GCP, how to save models and generate the log files as well as our implementation of GradCAM and CRM for the visualizations of our models.

Test

In "src/tests" you can find the code we used to generate wiki entries for our experiments as well as a notebook that we used to compare the results of all our models.

Demo Application

The code for the demo application can be found in the app/main.py file.