Annotation - linkedannotation/blah2015 GitHub Wiki
Annotation
Targeted set of drugs: Adderall, Ritalin, Modafinil, Adrafinil, Armodafinil, Citalopram, Escitalopram, Paroxetine, Fluoxetine, Fluvoxamine, Sertraline,
Targeted articles: Europe PMC (http://europepmc.org/)
Pipeline:
-
- Drug based article filtering (done)
-
- Patient descriptor identification (in progress)
-
- Phenotypes identification (in progress)
-
- Article sentence annotation (in progress)
Pipeline Technical details:
-
- Drug-based filtering: via NCBO Annotator of Europe PMC abstracts (code: https://github.com/jmbanda/BLAH2015)
-
- Patient descriptor identification (Simple Rule Language tool - in progress)
-
- Phenotype identification (Python dictionary-based tool available)
-
- Article sentence annotation (BRAT rapid annotation tool - in progress)
Potential issues:
- identify the article ID needed to retrieve the article (EPMC provides many Ids for the same article)
- Define the list of patient descriptors
- Phenomine's list of phenotypes is quite specific: How well can we match them in our set of articles?
Annotation of CRAFT corpus with PhenoMiner terms
Task list
-
- Download CRAFT and PhenoMiner term list on GitHub (https://github.com/nhcollier/PhenoMiner) -- done
-
- Automated matching on CRAFT texts to suggest potential terms from the PhenoMiner -- Currently in progress
-
- Manual annotation step using CRAFT in BRAT and the suggested candidate term annotations from stage (1)
Potential issues:
-
- Exact term matching needs to be more flexible (may miss disjointed terms)