Skip to content

Data Augmentation

guijacquemet edited this page Jul 1, 2020 · 1 revision

Data Augmentation

Short intro on Data augmentation

Data augmentation is a strategy used to artificially increase the size of training datasets. Data augmentation can improve training progress by amplifying differences in the dataset. This can be useful if the available dataset is small since, in this case, it is possible that a network could quickly learn every example in the dataset (overfitting), without augmentation. Augmentation can be especially valuable when training dataset need to be manually labelled. Augmentation is not necessary for training and if your training dataset is large you should disable it.

Data augmentation is not a magic solution and may also introduce issues. Therefore, we recommend that you train your network with and without augmentation, and use the QC section to validate that it improves overall performances.

Common augmentation strategies include rotating and flipping images but other strategies (zoom, perspective transforms, Elastic Distortions or shearing) can also be used.

Augmentor Open In Colab

Data augmentation (in one form or another) can easily be enabled or disabled in all our notebooks. We also provide a separate notebook, based on Augmentor, that can be used to augment or crop a dataset.

Augmentor was described in the following article:

Marcus D Bloice, Peter M Roth, Andreas Holzinger, Biomedical image augmentation using Augmentor, Bioinformatics, https://doi.org/10.1093/bioinformatics/btz259

Please also cite this original paper when using Augmentor.