Evaluation - TobiasSchmidtDE/DeepL-MedicalImaging GitHub Wiki

We run multiple experiments to compare our preprocessing methods and evaluate whether they improve the performance of our models.

we yield better results without augmentation
interestingly, with augmentation we get a better val AUC but a worse test AUC
this could maybe be due to the fact that in the test set, all pictures are centered and without a blank border, whereas in the training set (and subsequently, the val set split from that) we have unclean images
we could verify the worse results with augmentation by reimplementing the JF Healthcare architecture, where models trained with augmentation also performed worse