Evaluating System Results - DeepPhe/DeepPhe-Release GitHub Wiki

Included with the source for the DeepPhe system is a tool that was used to evaluate the output of the system compared to manually-curated expected results. This tool is a Java class named PhenotypeEval.

The tool can also be used to compare two runs of the system, e.g. if you find a problem with the formatting of your data, you can save the results of the first run of the system, re-run the system on your cleaned data, and use the tool to compare the two sets of results.

Overview of PhenotypeEval

To use the PhenotypeEval tool to evaluate the output of the system compared to manually-curated expected results requires

  • you know the results you expect for some set of documents
  • you create files in a specific format required by the tool that contain the expected results
  • you must ensure you haven't removed the MedicalRecordBsvWriter from the DeepPhe pipeline

See the sample expected results in

Format of the Expected Results files

  • file name contains Tumor for tumor expected results
    • place the results for all patients to be evaluated together into the same file
  • file name contains Cancer for cancer expected results
    • place the results for all patients to be evaluated together into the same file
  • bar separated
  • first line is a header line, naming the columns
  • each column name can be preceded by one of the following
        -doc
        -
        *
    The -doc prefix indicates the column should not be evaluated, but will be included in the output. This can be used for comment fields.
    The - prefix indicates the column should be ignored.
    The * prefix indicates the field must align with a value in the annotation created by a system, or this expected result will be considered a False Negative (FN). For example, if you want to require the diagnosis created by the system to match the diagnosis in your expected results in order for the system result to be aligned with the expected result, the column heading, including the delimiters, should be |***Diagnosis**|

PhenotypeEval optional parameters

  • -print Outputs a single line per Cancer and Tumor indicating TP, FP, TN, FN
  • -verbose Outputs details for each annotation (i.e. for the attributes for each Cancer and Tumor)
  • -strict Does not give partial credit when attributes that allow multiple values having overlapping values but don't have the same set of values
  • -include-those-without-location Use expected results even if there is no location aka body site listed.
    • by default, entries in the file of expected results are ignored if no location is given
    • there are times a location is not given in a patient's record, especially for mets.
  • -disallow-opposite-laterality Do not allow a Laterality value of Right to align with Laterality Left.
    • by default, results with the opposite Laterality are aligned, but the score is reduced. Using this option only gives partial credit if Left or Right is aligned with Bilateral
  • -include-those-differ-only-by-size Since v0.2.0 does not output multiple cancers if the only difference is in their sizes, this option will ignore an entry in the file of expected results if it is the same as another entry except for sizes

PhenotypeEval required parameters

There are two required parameters. The last two parameters are interpreted as

  1. The path to the directory containing the expected results aka gold files. Or the path to a single file if you are evaluating just the cancer output or just the tumor output.
  2. The path to the system output (produced by MedicalRecordBsvWriter). This can be a single file if you are evaluating just the cancer output or just the tumor output, or a directory containing multiple files.

Explanation of PhenotypeEval output

Tables of scores

The tool logs many informational messages. To easily find the table of Precision, Recall, Accuracy, and F1-measure results, search for all occurrences of "TP'", which is the column heading for the count of True Positives. These tables also give counts of:

  • False Positives (FP)
  • False Negatives (FN)
  • True Negatives (TN)
  • a weighted count of True Positives (TP).

If you used the -verbose parameter, the details of each TP, FP, and FN are included in the output For each TP, and you can see the difference between the expected values and the system output.

How annotations are aligned

For an expected result to be aligned with an annotation in the system output, at least two fields need to align - the patient ID and the body location. If the body location contains multiple values, any overlap is considered an alignment.

If the -disallow-opposite-laterality parameter is used, an expected result will not be aligned with a system result if they have opposite laterality values. Bilateral will be allowed to align with either Right or Left.

There are some values that are treated specially and are allowed to align even though they are not strictly synonyms

  • Fallopian Tube and Ovary
  • Gastric Tissue and Stomach
  • Thigh and Femur
  • Axilla and Axillary Lymph Node