Uncertainty Encoding - TobiasSchmidtDE/DeepL-MedicalImaging GitHub Wiki
First, we take a look at how the labels in the dataset are designed:
positive label (1): observations with at least one positively classified mention in the report
uncertain label (u): no positively classified mention and at least one uncertain mention
negative label (0): observations with at least one negatively classified mention in the report
NaN: no mention of the observation
- ignore all uncertain labels during training
- but: reduces the effective size of the dataset
- U-Zeros: map all u-labels to 0
- U-Ones: map all u-labels to 1
- instead of using one binary mapping for all classes, a class-based approach can be used
- it has been stated in literature that different classes work better with different uncertainty encodings
- all papers we could find only focus on the 5 best-performing classes, so we do not have any predetermined information on what encodings work best for all of the 12 classes
- (the only paper)[ https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8719904] that reports results for all 12 classes uses the UIgnore approach, an approach that we have outruled simply because we will lose a lot of data
We run a comparison experiment and determine the best uncertainty encodings ourselves:
Pathology | UZeroes | UOnes |
---|---|---|
En. Cardiomegastinum | 0.529 | 0.520 |
Cardiomegaly | 0.782 | 0.762 |
Lung Opacity | 0.905 | 0.858 |
Lung Lesion | 0.761 | 0.824 |
Edema | 0.901 | 0.891 |
Consolidation | 0.935 | 0.901 |
Pneumonia | 0.742 | 0.765 |
Atelectasis | 0.807 | 0.768 |
Pneumothorax | 0.692 | 0.795 |
Pleural Effusion | 0.924 | 0.927 |
Pleural Other | 0.984 | 0.877 |
A problem we encountered with the dataset were NaN values in the training set. NaN-labels mean that the NLP labeler could not find any mention of that in the report. We decide to encode this as 0, as the disease would have been specifically mentioned in the report if it would be present. In this case we calculate the BCE for all classes and average it.
We have also implemented and tested a masked loss function, where the BCE is calculated only for the classes where we have no-NaN labels, but did not find any improvements in using this custom loss function.