UMLS concept mention structure - Honghan/klannotation GitHub Wiki

UMLS concept mention structure:

{
      "ruled_by": [
        "negation_filters.json" // can be a list of filters including negation, not-a-mention, hypothetical, abbreviation etc.
      ],
      "end": 1734, // end offset of the annotated term
      "pref": "Bleeding", // preferred label from UMLS for this concept
      "negation": "Negated", // "Negated" or "Affirmed"
      "sty": "Pathologic Function", // semantic type of the UMLS concept
      "start": 1726, // start offset of the annotated term
      "study_concepts": [
        "Bleeding"
      ],
      "experiencer": "Patient", // "Patient" or "Others"
      "str": "bleeding", // the string of the annotated term
      "temporality": "Recent", // "Recent", "Past" or "Hypothetical"
      "id": "cui-47", // internal ID, ignore it
      "cui": "C0019080" // UMLS CUI id
}
  • ruled_by gives the rule set that the annotation matched. Generally, there are several types of rules of negation, hypothetical, not a mention, other experiencer (see here for full list). These rules were developed for clinical studies conducted on SLaM CRIS data. You can also create your own rules using the same syntax (regular expressions). Consider this as an extra improvement step on the embedded NLP model (i.e. bio-yodie for now) in SemEHR.
  • study_concepts the type of the annotation as specified in the study configuration (e.g., cancer can be mapped to many UMLS concepts). So, essentially, a study concept denotes a list of UMLS CUIs, which is specified by the study designer. SemEHR repo's studies folder contains configurations of several studies conducted on SLaM CRIS.
  • all other attribtues are general attributes of SemEHR as specified in the wiki.