4. Understanding MetaMap output - NCBI-Codeathons/Use-UMLS-and-Python-to-classify-website-visitor-queries-into-measurable-categories GitHub Wiki

(Optional - not part of MVP.)

For more see https://metamap.nlm.nih.gov/

MetaMap output documentation https://metamap.nlm.nih.gov/Docs/MMI_Output_2014.pdf

Example

text_000N_50478|MMI|333.76|PubMed|C1138432|[inpr]|["PubMed"-tx-1-"pubmed"-noun-0]|TX|[54/6],[225/6],[496/6],[730/6],[1013/6],[1360/6],[1474/6],[1667/6],[1827/6],[1868/6]|L01.313.500.750.280.750;L01.313.500.750.300.188.300.650;L01.313.500.750.300.742.650;L01.470.750.500.650

text_000N_50478|MMI|315.12|National Library of Medicine (U.S.)|C0027470|[hcro]|["NLM"-tx-1-"nlm"-noun-0,"National Library of Medicine"-tx-1-"national library of medicine"-noun-0,"United States National Library of Medicine"-tx-1-"united states national library of medicine"-noun-0]|TX|[319/3],[814/3],[1239/3],[1647/3],[1856/3];[78/28],[189/28],[898/28];957/42|I01.409.418.750.600.

Data dictionary

Column Description
ID Unique identifier used to identify text being processed. If no identifier is found in the text, 00000000 will be displayed. For MEDLINE ASCII formatted input, the ID is considered the PMID.
Score MetaMap Indexing (MM I) score with a maximum score of 1000.00. The higher the score, the greater the relevance of the UMLS concept according to the MMI algorithm. The MMI results are presented in highest to lowest relevance order.
UMLS Concept Preferred Name The preferred name for the UMLS concept identified in the text.
UMLS Concept Unique Identifier (CUI) The CUI for the identified UMLS concept.
Semantic Type List Comma separated list of Semantic Type abbreviations for the identified UMLS concept. More information on Semantic Types and Semantic Groups can be found at (http://metamap.nlm.nih.gov/SemanticTypesAndGroups.shtml).
Trigger Information Comma separated quadruple showing what triggered MMI to identify this UMLS concept. Ordering tends to go last to first found. The quadruple consists of β€œUMLS concept-loc-locPos-text”.
β€’ UMLS Concept (Preferred or Synonym Text)
β€’ loc – Location in the text if identifiable. ti – Title, ab – Abstract, and tx – Free Text
β€’ locPos – Number of the utterance within the loc starting with one (1). For example, β€œti-1” denotes first utterance in title.
β€’ text – The actual text mapped to this UMLS concept identification.
Location Summarizes where UMLS concept was found. TI – Title, AB – Abstract, TX – Free Text, TI;AB – Title and Abstract
Positional Information Bar separated list of positional information doubles showing StartPos, colon (:), and Length of each trigger identified in the Trigger Information field. StartPos begins at position zero (0) of the input text. For example, in the first line of sample results above, the β€œ228:6 - 136:7” positional information shows that MetaMap found two triggers in the text. One starting at position 228 which is 6 characters in length (isopod) and the second at position 136 consisting of 7 characters (Isopoda). The positional information like the trigger information tends to go from last found to first found.
Treecode(s) Semicolon-separated list of any MeSH treecode(s) (http://www.nlm.nih.gov/mesh/meshhome.html) associated with the UMLS concept; field may be null if no treecodes were found for the concept.