Clinical NLP Transformers
This is an example piper file that will spin up a complete pbj pipeline.
(Cnlpt Negation)
graph TD;
A[Artemis Starter] --> D[Starts a new instance of cTAKES with the given piper parameters]
A --> E[Will pip a specified python package]
E --> F[Starts a Python process with the given parameters]
E --> G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
I --> J[Fast dictionary lookup]
J --> K[Send jcas to Artemis Queue]
K --> L[Add the Finished Logger for some run statistics]
L --> M[Forcibly Exits cTAKES]
Degree-of and location-of relation, events,times, temporal relations, document creation time relations and core
(Default Advanced Pipeline)
graph TD;
G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
J --> K[Part of Speech Tagger]
K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Degree of Annotator]
P --> Q[Location of Annotator]
Q --> R[Event Annotator]
R --> S[Annotates absolute time / date Temporal expressions]
S --> T[Annotates event relativity to document creation time]
T --> U[Annotates Temporal Events]
U --> V[Creates Event - Event TLinks]
V --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
W --> X[Deterministic Markable Annotator]
Pipeline with coreference resolution.
(Default Coref Pipeline)
graph TD;
W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
W --> X[Deterministic Markable Annotator]
Commands and parameters to create a plaintext document processing pipeline with UMLS entity lookup.
(Default Fast Pipeline)
graph TD;
G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
J --> K[Part of Speech Tagger]
K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
Pipeline with degree-of and location-of relations and coreference resolution.
(Default Relation Coref Pipeline)
graph TD;
G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
J --> K[Part of Speech Tagger]
K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Degree of Annotator]
P --> Q[Location of Annotator]
Q --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
W --> X[Deterministic Markable Annotator]
Clinical Pipeline with degree-of and location-of relations.
(Default Relation Pipeline)
graph TD;
G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
J --> K[Part of Speech Tagger]
K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Degree of Annotator]
P --> Q[Location of Annotator]
Clinical Pipeline with degree-of, location-of, events, times, temporal and event-doc creation time relations.
(Default Relation Temporal Pipeline)
graph TD;
G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
J --> K[Part of Speech Tagger]
K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Degree of Annotator]
P --> Q[Location of Annotator]
Q --> R[Event Annotator]
R --> S[Annotates absolute time / date Temporal expressions]
S --> T[Annotates event relativity to document creation time]
T --> U[Annotates Temporal Events]
U --> V[Creates Event - Event TLinks]
Pipeline with events, times, temporal relations, document creation time relations and coreferences.
(Default Temporal Coref Pipeline)
graph TD;
G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
J --> K[Part of Speech Tagger]
K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> R[Event Annotator]
R --> S[Annotates absolute time / date Temporal expressions]
S --> T[Annotates event relativity to document creation time]
T --> U[Annotates Temporal Events]
U --> V[Creates Event - Event TLinks]
V --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
W --> X[Deterministic Markable Annotator]
Clinical Pipeline with events, times, event-event and event-time relations plus event-document creation time relations.
(Default Temporal Pipeline)
graph TD;
G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
J --> K[Part of Speech Tagger]
K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> R[Event Annotator]
R --> S[Annotates absolute time / date Temporal expressions]
S --> T[Annotates event relativity to document creation time]
T --> U[Annotates Temporal Events]
U --> V[Creates Event - Event TLinks]
Commands and parameters for a small tokenization pipeline.
(Default Tokenizer Pipeline)
graph TD;
G[Single Sectionizer]
G --> H[Sentence Detector]
H --> I[PTB Tokenizer]
Commands and parameters for a small tokenization pipeline with sections, paragraphs and lists.
(Full Tokenizer Pipeline)
graph TD;
A[Annotates Document Sections by detecting Section Headers using Regular Expressions provided in a Bar-Separated-Value bsv File] --> B[Annotates Sentences based upon an OpenNLP model]
B --> C[paragraphs are parsed using empty lines as separators]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
This is a piper file that will perform initial steps required for running a ctakes-pbj pipeline.
(Pbj Starter)
graph TD;
A[ Starting Artemis Broker Instance] --> B[ Pip the dependency packages in case your environment doesn’t have them or needs an update]
B --> C[Add the Finished Logger for some run statistics]
C --> D[Force a stop, just in case some external process is trying to stay connected]
This is a piper file that will perform final steps required for stopping a ctakes-pbj pipeline.
(Pbj Starter)
graph TD;
A[ Stop the Artemis Broker ] --> B[Add the Finished Logger for some run statistics]
B --> C[Force a stop, just in case some external process is trying to stay connected]
ADVANCED PIPELINES
Pipeline with section, paragraph and list detection, degree-of and location-of relations ...
(Sectioned Advanced Pipeline)
graph TD;
A[Annotate sections by known regex] --> B[Sentence Detector]
B --> C[Paragraph Annotator]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
H --> I[Annotate Parts of Speech]
I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Annotates Degree Of relations]
P --> Q[Annotates Location Of relations]
Q --> R[Annotates Temporal Events]
R --> S[Annotates absolute time / date Temporal expressions]
S --> T[Annotates event relativity to document creation time]
T --> U[Creates Event - Time TLinks]
U --> V[Creates Event - Event TLinks]
V --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
W --> X[Deterministic Markable Annotator]
X --> Y[Annotates Markable Salience]
Y --> Z[MentionClusterCoreferenceAnnotator]
Pipeline with section, paragraph and list detection and coreference resolution.
(Sectioned Coref Pipeline)
graph TD;
A[Annotate sections by known regex] --> B[Sentence Detector]
B --> C[Paragraph Annotator]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
H --> I[Annotate Parts of Speech]
I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
W --> X[Deterministic Markable Annotator]
X --> Y[Annotates Markable Salience]
Y --> Z[MentionClusterCoreferenceAnnotator]
Commands and parameters to create a plaintext document processing pipeline with Sections, paragraphs and lists
(Sectioned Fast Pipeline)
graph TD;
A[Annotate sections by known regex] --> B[Sentence Detector]
B --> C[Paragraph Annotator]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
H --> I[Annotate Parts of Speech]
I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
Pipeline with section, paragraph and list detection, degree-of and location-of relations and coreferences
(Sectioned Relation Coref Pipeline)
graph TD;
A[Annotate sections by known regex] --> B[Sentence Detector]
B --> C[Paragraph Annotator]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
H --> I[Annotate Parts of Speech]
I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Annotates Degree Of relations]
P --> Q[Annotates Location Of relations]
Q --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
W --> X[Deterministic Markable Annotator]
X --> Y[Annotates Markable Salience]
Y --> Z[MentionClusterCoreferenceAnnotator]
Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations
(Section Relation Pipeline)
graph TD;
A[Annotate sections by known regex] --> B[Sentence Detector]
B --> C[Paragraph Annotator]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
H --> I[Annotate Parts of Speech]
I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Annotates Degree Of relations]
P --> Q[Annotates Location Of relations]
Clinical Pipeline with sections, paragraphs, lists, degree-of, location-of, events, times, temporal and event-doc creation time relations
(Section Relation Temporal Pipeline)
graph TD;
A[Annotate sections by known regex] --> B[Sentence Detector]
B --> C[Paragraph Annotator]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
H --> I[Annotate Parts of Speech]
I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Annotates Degree Of relations]
P --> Q[Annotates Location Of relations]
Q --> R[Annotates Temporal Events]
R --> S[Annotates absolute time / date Temporal expressions]
S --> T[Annotates event relativity to document creation time]
T --> U[Creates Event - Time TLinks]
U --> V[Creates Event - Event TLinks]
Pipeline with section, paragraph and list detection, events, times, temporal relations and document creation time relations
(Sectioned Temporal Coref Pipeline)
graph TD;
A[Annotate sections by known regex] --> B[Sentence Detector]
B --> C[Paragraph Annotator]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
H --> I[Annotate Parts of Speech]
I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> O[Annotates Modifiers and Chunks]
O --> P[Annotates Degree Of relations]
P --> Q[Annotates Location Of relations]
Q --> R[Annotates Temporal Events]
R --> S[Annotates absolute time / date Temporal expressions]
S --> T[Annotates event relativity to document creation time]
T --> U[Creates Event - Time TLinks]
U --> V[Creates Event - Event TLinks]
Clinical Pipeline with sections, paragraphs, lists, events, times, temporal and event-doc creation time relations
(Sectioned Temporal Pipeline)
graph TD;
A[Annotate sections by known regex] --> B[Sentence Detector]
B --> C[Paragraph Annotator]
C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
E --> F[Fix sentences so that no sentence spans across two or more list entries]
F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
H --> I[Annotate Parts of Speech]
I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
L --> M[Default fast dictionary lookup]
M --> N[Adds Semantic Roles Relations]
N --> R[Annotates Temporal Events]
R --> S[Annotates absolute time / date Temporal expressions]
S --> T[Annotates event relativity to document creation time]
T --> U[Creates Event - Time TLinks]
U --> V[Creates Event - Event TLinks]