Final Results - TobiasSchmidtDE/DeepL-MedicalImaging GitHub Wiki

Test Set Results

AUC score on Chexpert test set (for exact specification of our models see end of page):

Pathology	P: Dataset Paper	P: Ensemble	P: IEEE	RI: JF Healthcare	Ours Best Single Model	Our Best Ensemble
Enlarged Cardiomedastinum	-	-	0.71	0.901	0.53	0.63
Cardiomegaly	0.90	0.910	0.88	0.623	0.79	0.80
Lung Opacity	-	-	0.76	0.787	0.91	0.93
Lung Lesion	-	-	0.80	0.922	0.98	0.94
Edema	0.92	0.958	0.87	0.859	0.92	0.93
Consolidation	0.90	0.957	0.77	0.917	0.92	0.92
Pneumonia	-	-	0.79	0.858	0.83	0.80
Atelectasis	0.85	0.909	0.72	0.779	0.83	0.85
Pneumothorax	-	-	0.86	0.826	0.85	0.87
Pleural Effusion	0.97	0.964	0.90	0.885	0.92	0.93
Pleural Other	-	-	0.80	0.919	0.98	0.97
Average	0.908	0.940	0.812	0.843	0.860	0.870

P Dataset Paper: CheXpert paper where the dataset is introduced, results reported in the paper
P Ensemble: best ensemble in the leaderboard, results reported in the paper
P IEE: paper that reports results on all 12 classes, results reported in the paper
RI JF Healthcare: best paper on the leaderboard that made their code publicly available, results recieved while re-implementing the model with the code provided by the authors and modifying it for 12 classes

note: while we also trained the models on the pathology 'Fracture', it is not included in this evaluation as there are no examples for this pathology present in the test set.

Validation Set Results

AUC on our own validation set:

Pathology	Ours Best Single Model	Our Best Ensemble
Enlarged Cardiomedastinum	0.67	0.67
Cardiomegaly	0.87	0.87
Lung Opacity	0.75	0.75
Lung Lesion	0.79	0.79
Edema	0.86	0.85
Consolidation	0.77	0.76
Pneumonia	0.75	0.74
Atelectasis	0.72	0.69
Pneumothorax	0.86	0.86
Pleural Effusion	0.87	0.87
Pleural Other	0.81	0.82
Fracture	0.80	0.79
Average	0.86	0.86

Threshold Evaluation

We also evaluate the optimal threshold on the validation set for our single best model and calculate the best Precision & Recall:

Enlarged Cardiomedastinum: Precision 0.0750 | Recall 0.6180

0__1_

Cardiomegaly: Precision 0.3762 | Recall 0.7986

1__1_

Lung Opacity: Precision 0.6514 | Recall 0.6809

2__1_

Lung Lesion: Precision 0.0988 | Recall 0.7148

3__1_

Edema: Precision 0.5907 | Recall 0.7805

4__1_

Consolidation: Precision 0.1308 | Recall 0.6970

5__1_

Pneunomia: Precision 0.2003 | Recall 0.6823

6__1_

Atelectasis: Precision 0.4468 | Recall 0.6563

7__1_

Pneumothorax: Precision 0.25411 | Recall 0.7859

Pleural Effusion: Precision 0.7051 | Recall 0.7899

Pleural Other: Precision 0.07022 | Recall 0.7301

10__1_

Fracture: Precision 0.1115 | Recall 0.7228

11__1_

Best Single Model Benchmark

{
'benchmark_name': '2_Chexpert_CWBCE_L1Normed_E5_B32_C0_N12_AugAffine_sharp21_U75_D256_DS9505_1LR4_LF1_Adam_Upsampled'
'dataset_name': 'chexpert_full',
'dataset_folder': 'data/chexpert/full',
'models_dir': 'models',
'epochs': 5,
'optimizer': 'Adam',
'learning_rate': 0.0001,
'lr_factor': 0.1,
'loss': 'weighted_binary_crossentropy',
'use_class_weights': False,
'label_columns': ['Enlarged Cardiomediastinum', 'Cardiomegaly', 'Lung Opacity', 'Lung Lesion', 'Edema', 'Consolidation', 'Pneumonia', 'Atelectasis', 'Pneumothorax', 'Pleural Effusion', 'Pleural Other', 'Fracture'],
'upsample_factors': {'Enlarged Cardiomediastinum': 1, 'Lung Lesion': 1, 'Pleural Other': 2, 'Fracture': 2},
'shuffle': True,
'batch_size': 32,
'dim': [256, 256],
'crop': False,
'transformations': {'unsharp_mask': {'radius': 2, 'amount': 1}},
'augmentation': 'affine',
'n_channels': 3,
'nan_replacement': 0,
'unc_value': -1,
'u_enc': [['Cardiomegaly', 'Enlarged Cardiomediastinum', 'Lung Opacity', 'Lung Lesion', 'Consolidation', 'Pneumothorax', 'Pleural Effusion'], ['Edema', 'Atelectasis', 'Fracture', 'Pleural Other', 'Pneumonia']],
'num_samples_train': 266998,
'num_samples_validation': 11596,
'num_samples_test': 234,
'split_seed': 6122156
}

Our best Ensemble:

We tried out different combinations of models and different ways of ensembling. Our best results were achieved by combining the following 10 models in an weighted average ensemble. These 10 models were choosen because each of them showed the best performance on at least one class.

DenseNet121_Chexpert_WBCE_E3_B32_C0_N12_AugAffine_U75_D256_DS9505_1LR1_LF5_SGD_Upsampled - Weight: 0.3084 DenseNet121_Chexpert_BCE_E3_B32_C0_N12_D256_DS9505_1LR1_SGD_NoCustomMetrics - Weight: 0.1728 DenseNet121_Chexpert_BCE_E3_B32_C0_N12_AugAffine_Uones_D256_DS9505_2LR1_LF5_SGD_Upsampled - Weight: 0.0382 DenseNet121_Chexpert_CWBCE_L1Normed_E3_B32_C0_N12_AugAffine_hist_gb_U75_D256_DS9505_2LR4_LF5_Adam_Upsampled - Weight: 0.1624 DenseNet121_Chexpert_BCE_E3_B32_C0_N12_AugAffine_U75_D256_DS9505_2LR1_LF5_SGD_Upsampled - Weight: 0.0544 - DenseNet121_Chexpert_CWBCE_E3_B32_C0_N12_AugAffine_U75_D256_DS9505_1LR1_LF5_SGD_Upsampled - Weight: 0.0441 DenseNet121_Chexpert_CWBCE_L1Normed_E5_B32_C0_N12_AugAffine_sharp21_U75_D256_DS9505_1LR4_LF1_Adam_Upsampled - Weight: 0.0114 DenseNet121_Chexpert_CWBCE_L1Normed_E3_B32_C0_N12_AugColor_sharp21_U75_D256_DS9505_5LR1_LF1_SGD_Upsampled - Weight: 0.0124 DenseNet121_Chexpert_CWBCE_L1Normed_E3_B32_C0_N12_AugAffine_sharp21_U75_D256_DS9505_5LR1_LF1_SGD_Upsampled - Weight: 0.0006 DenseNet121_Chexpert_BCE_E3_B32_C0_N12_AugAffine_UZEROES_D256_DS9505_2LR1_LF5_SGD_Upsampled - Weight: 0.1954