Evaluation - TobiasSchmidtDE/DeepL-MedicalImaging GitHub Wiki
We run multiple experiments to compare our preprocessing methods and evaluate whether they improve the performance of our models.
Augmentation:
- we yield better results without augmentation
- interestingly, with augmentation we get a better val AUC but a worse test AUC
- this could maybe be due to the fact that in the test set, all pictures are centered and without a blank border, whereas in the training set (and subsequently, the val set split from that) we have unclean images
- we could verify the worse results with augmentation by reimplementing the JF Healthcare architecture, where models trained with augmentation also performed worse
Upsampling:
- clearly gives better performance
Uncertainty Encodings:
- no comparison possible yet
Best Metric:
- no comparison possible yet