Final Results - TobiasSchmidtDE/DeepL-MedicalImaging GitHub Wiki
AUC score on Chexpert test set (for exact specification of our models see end of page):
Pathology | P: Dataset Paper | P: Ensemble | P: IEEE | RI: JF Healthcare | Ours Best Single Model | Our Best Ensemble |
---|---|---|---|---|---|---|
Enlarged Cardiomedastinum | - | - | 0.71 | 0.901 | 0.53 | 0.63 |
Cardiomegaly | 0.90 | 0.910 | 0.88 | 0.623 | 0.79 | 0.80 |
Lung Opacity | - | - | 0.76 | 0.787 | 0.91 | 0.93 |
Lung Lesion | - | - | 0.80 | 0.922 | 0.98 | 0.94 |
Edema | 0.92 | 0.958 | 0.87 | 0.859 | 0.92 | 0.93 |
Consolidation | 0.90 | 0.957 | 0.77 | 0.917 | 0.92 | 0.92 |
Pneumonia | - | - | 0.79 | 0.858 | 0.83 | 0.80 |
Atelectasis | 0.85 | 0.909 | 0.72 | 0.779 | 0.83 | 0.85 |
Pneumothorax | - | - | 0.86 | 0.826 | 0.85 | 0.87 |
Pleural Effusion | 0.97 | 0.964 | 0.90 | 0.885 | 0.92 | 0.93 |
Pleural Other | - | - | 0.80 | 0.919 | 0.98 | 0.97 |
Average | 0.908 | 0.940 | 0.812 | 0.843 | 0.860 | 0.870 |
P Dataset Paper: CheXpert paper where the dataset is introduced, results reported in the paper
P Ensemble: best ensemble in the leaderboard, results reported in the paper
P IEE: paper that reports results on all 12 classes, results reported in the paper
RI JF Healthcare: best paper on the leaderboard that made their code publicly available, results recieved while re-implementing the model with the code provided by the authors and modifying it for 12 classes
note: while we also trained the models on the pathology 'Fracture', it is not included in this evaluation as there are no examples for this pathology present in the test set.
AUC on our own validation set:
Pathology | Ours Best Single Model | Our Best Ensemble |
---|---|---|
Enlarged Cardiomedastinum | 0.67 | 0.67 |
Cardiomegaly | 0.87 | 0.87 |
Lung Opacity | 0.75 | 0.75 |
Lung Lesion | 0.79 | 0.79 |
Edema | 0.86 | 0.85 |
Consolidation | 0.77 | 0.76 |
Pneumonia | 0.75 | 0.74 |
Atelectasis | 0.72 | 0.69 |
Pneumothorax | 0.86 | 0.86 |
Pleural Effusion | 0.87 | 0.87 |
Pleural Other | 0.81 | 0.82 |
Fracture | 0.80 | 0.79 |
Average | 0.86 | 0.86 |
We also evaluate the optimal threshold on the validation set for our single best model and calculate the best Precision & Recall:
Enlarged Cardiomedastinum: Precision 0.0750 | Recall 0.6180
Cardiomegaly: Precision 0.3762 | Recall 0.7986
Lung Opacity: Precision 0.6514 | Recall 0.6809
Lung Lesion: Precision 0.0988 | Recall 0.7148
Edema: Precision 0.5907 | Recall 0.7805
Consolidation: Precision 0.1308 | Recall 0.6970
Pneunomia: Precision 0.2003 | Recall 0.6823
Atelectasis: Precision 0.4468 | Recall 0.6563
Pneumothorax: Precision 0.25411 | Recall 0.7859
Pleural Effusion: Precision 0.7051 | Recall 0.7899
Pleural Other: Precision 0.07022 | Recall 0.7301
Fracture: Precision 0.1115 | Recall 0.7228
{
'benchmark_name': '2_Chexpert_CWBCE_L1Normed_E5_B32_C0_N12_AugAffine_sharp21_U75_D256_DS9505_1LR4_LF1_Adam_Upsampled'
'dataset_name': 'chexpert_full',
'dataset_folder': 'data/chexpert/full',
'models_dir': 'models',
'epochs': 5,
'optimizer': 'Adam',
'learning_rate': 0.0001,
'lr_factor': 0.1,
'loss': 'weighted_binary_crossentropy',
'use_class_weights': False,
'label_columns': ['Enlarged Cardiomediastinum',
'Cardiomegaly',
'Lung Opacity',
'Lung Lesion',
'Edema',
'Consolidation',
'Pneumonia',
'Atelectasis',
'Pneumothorax',
'Pleural Effusion',
'Pleural Other',
'Fracture'],
'upsample_factors': {'Enlarged Cardiomediastinum': 1,
'Lung Lesion': 1,
'Pleural Other': 2,
'Fracture': 2},
'shuffle': True,
'batch_size': 32,
'dim': [256, 256],
'crop': False,
'transformations': {'unsharp_mask': {'radius': 2, 'amount': 1}},
'augmentation': 'affine',
'n_channels': 3,
'nan_replacement': 0,
'unc_value': -1,
'u_enc': [['Cardiomegaly',
'Enlarged Cardiomediastinum',
'Lung Opacity',
'Lung Lesion',
'Consolidation',
'Pneumothorax',
'Pleural Effusion'],
['Edema', 'Atelectasis', 'Fracture', 'Pleural Other', 'Pneumonia']],
'num_samples_train': 266998,
'num_samples_validation': 11596,
'num_samples_test': 234,
'split_seed': 6122156
}
We tried out different combinations of models and different ways of ensembling. Our best results were achieved by combining the following 10 models in an weighted average ensemble. These 10 models were choosen because each of them showed the best performance on at least one class.
DenseNet121_Chexpert_WBCE_E3_B32_C0_N12_AugAffine_U75_D256_DS9505_1LR1_LF5_SGD_Upsampled - Weight: 0.3084 DenseNet121_Chexpert_BCE_E3_B32_C0_N12_D256_DS9505_1LR1_SGD_NoCustomMetrics - Weight: 0.1728 DenseNet121_Chexpert_BCE_E3_B32_C0_N12_AugAffine_Uones_D256_DS9505_2LR1_LF5_SGD_Upsampled - Weight: 0.0382 DenseNet121_Chexpert_CWBCE_L1Normed_E3_B32_C0_N12_AugAffine_hist_gb_U75_D256_DS9505_2LR4_LF5_Adam_Upsampled - Weight: 0.1624 DenseNet121_Chexpert_BCE_E3_B32_C0_N12_AugAffine_U75_D256_DS9505_2LR1_LF5_SGD_Upsampled - Weight: 0.0544 - DenseNet121_Chexpert_CWBCE_E3_B32_C0_N12_AugAffine_U75_D256_DS9505_1LR1_LF5_SGD_Upsampled - Weight: 0.0441 DenseNet121_Chexpert_CWBCE_L1Normed_E5_B32_C0_N12_AugAffine_sharp21_U75_D256_DS9505_1LR4_LF1_Adam_Upsampled - Weight: 0.0114 DenseNet121_Chexpert_CWBCE_L1Normed_E3_B32_C0_N12_AugColor_sharp21_U75_D256_DS9505_5LR1_LF1_SGD_Upsampled - Weight: 0.0124 DenseNet121_Chexpert_CWBCE_L1Normed_E3_B32_C0_N12_AugAffine_sharp21_U75_D256_DS9505_5LR1_LF1_SGD_Upsampled - Weight: 0.0006 DenseNet121_Chexpert_BCE_E3_B32_C0_N12_AugAffine_UZEROES_D256_DS9505_2LR1_LF5_SGD_Upsampled - Weight: 0.1954