Piper File List Diagrams - apache/ctakes GitHub Wiki

Clinical NLP Transformers

This is an example piper file that will spin up a complete pbj pipeline.

(Cnlpt Negation)

graph TD;
        A[Artemis Starter] -->  D[Starts a new instance of cTAKES with the given piper parameters]
        A --> E[Will pip a specified python package]
        E --> F[Starts a Python process with the given parameters]
        E --> G[Single Sectionizer]
        G --> H[Sentence Detector]
        H --> I[PTB Tokenizer]
        I --> J[Fast dictionary lookup]
        J --> K[Send jcas to Artemis Queue]
        K --> L[Add the Finished Logger for some run statistics]
        L --> M[Forcibly Exits cTAKES]
        

Degree-of and location-of relation, events,times, temporal relations, document creation time relations and core

(Default Advanced Pipeline)

graph TD;
    G[Single Sectionizer]
    G --> H[Sentence Detector]
    H --> I[PTB Tokenizer]
    I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    J --> K[Part of Speech Tagger]
    K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Degree of Annotator]
    P --> Q[Location of Annotator]
    Q --> R[Event Annotator]
    R --> S[Annotates absolute time / date Temporal expressions]
    S --> T[Annotates event relativity to document creation time]
    T --> U[Annotates Temporal Events]
    U --> V[Creates Event - Event TLinks]
    V --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
    W --> X[Deterministic Markable Annotator]


Pipeline with coreference resolution.

(Default Coref Pipeline)

graph TD;
    W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
    W --> X[Deterministic Markable Annotator]


Commands and parameters to create a plaintext document processing pipeline with UMLS entity lookup.

(Default Fast Pipeline)

graph TD;
    G[Single Sectionizer]
    G --> H[Sentence Detector]
    H --> I[PTB Tokenizer]
    I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    J --> K[Part of Speech Tagger]
    K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]


Pipeline with degree-of and location-of relations and coreference resolution.

(Default Relation Coref Pipeline)

graph TD;
    G[Single Sectionizer]
    G --> H[Sentence Detector]
    H --> I[PTB Tokenizer]
    I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    J --> K[Part of Speech Tagger]
    K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Degree of Annotator]
    P --> Q[Location of Annotator]
    Q --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
    W --> X[Deterministic Markable Annotator]


Clinical Pipeline with degree-of and location-of relations.

(Default Relation Pipeline)

graph TD;
    G[Single Sectionizer]
    G --> H[Sentence Detector]
    H --> I[PTB Tokenizer]
    I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    J --> K[Part of Speech Tagger]
    K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Degree of Annotator]
    P --> Q[Location of Annotator]


Clinical Pipeline with degree-of, location-of, events, times, temporal and event-doc creation time relations.

(Default Relation Temporal Pipeline)

graph TD;
    G[Single Sectionizer]
    G --> H[Sentence Detector]
    H --> I[PTB Tokenizer]
    I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    J --> K[Part of Speech Tagger]
    K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Degree of Annotator]
    P --> Q[Location of Annotator]
    Q --> R[Event Annotator]
    R --> S[Annotates absolute time / date Temporal expressions]
    S --> T[Annotates event relativity to document creation time]
    T --> U[Annotates Temporal Events]
    U --> V[Creates Event - Event TLinks]


Pipeline with events, times, temporal relations, document creation time relations and coreferences.

(Default Temporal Coref Pipeline)

graph TD;
    G[Single Sectionizer]
    G --> H[Sentence Detector]
    H --> I[PTB Tokenizer]
    I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    J --> K[Part of Speech Tagger]
    K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> R[Event Annotator]
    R --> S[Annotates absolute time / date Temporal expressions]
    S --> T[Annotates event relativity to document creation time]
    T --> U[Annotates Temporal Events]
    U --> V[Creates Event - Event TLinks]
    V --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
    W --> X[Deterministic Markable Annotator]


Clinical Pipeline with events, times, event-event and event-time relations plus event-document creation time relations.

(Default Temporal Pipeline)

graph TD;
    G[Single Sectionizer]
    G --> H[Sentence Detector]
    H --> I[PTB Tokenizer]
    I --> J[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    J --> K[Part of Speech Tagger]
    K --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> R[Event Annotator]
    R --> S[Annotates absolute time / date Temporal expressions]
    S --> T[Annotates event relativity to document creation time]
    T --> U[Annotates Temporal Events]
    U --> V[Creates Event - Event TLinks]


Commands and parameters for a small tokenization pipeline.

(Default Tokenizer Pipeline)

graph TD;
    G[Single Sectionizer]
    G --> H[Sentence Detector]
    H --> I[PTB Tokenizer]


Commands and parameters for a small tokenization pipeline with sections, paragraphs and lists.

(Full Tokenizer Pipeline)

graph TD;
    A[Annotates Document Sections by detecting Section Headers using Regular Expressions provided in a Bar-Separated-Value bsv File] --> B[Annotates Sentences based upon an OpenNLP model]
    B --> C[paragraphs are parsed using empty lines as separators]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]

This is a piper file that will perform initial steps required for running a ctakes-pbj pipeline.

(Pbj Starter)

graph TD;
    A[ Starting Artemis Broker Instance] --> B[ Pip the dependency packages in case your environment doesn’t have them or needs an update]
    B --> C[Add the Finished Logger for some run statistics]
    C --> D[Force a stop, just in case some external process is trying to stay connected]

This is a piper file that will perform final steps required for stopping a ctakes-pbj pipeline.

(Pbj Starter)

graph TD;
    A[ Stop the Artemis Broker ] --> B[Add the Finished Logger for some run statistics]
    B --> C[Force a stop, just in case some external process is trying to stay connected]

ADVANCED PIPELINES

Pipeline with section, paragraph and list detection, degree-of and location-of relations ...

(Sectioned Advanced Pipeline)

graph TD;
    A[Annotate sections by known regex] --> B[Sentence Detector]
    B --> C[Paragraph Annotator]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
    G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    H --> I[Annotate Parts of Speech]
    I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Annotates Degree Of relations]
    P --> Q[Annotates Location Of relations]
    Q --> R[Annotates Temporal Events]
    R --> S[Annotates absolute time / date Temporal expressions]
    S --> T[Annotates event relativity to document creation time]
    T --> U[Creates Event - Time TLinks]
    U --> V[Creates Event - Event TLinks]
    V --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
    W --> X[Deterministic Markable Annotator]
    X --> Y[Annotates Markable Salience]
    Y --> Z[MentionClusterCoreferenceAnnotator]


Pipeline with section, paragraph and list detection and coreference resolution.

(Sectioned Coref Pipeline)

graph TD;
    A[Annotate sections by known regex] --> B[Sentence Detector]
    B --> C[Paragraph Annotator]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
    G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    H --> I[Annotate Parts of Speech]
    I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
    W --> X[Deterministic Markable Annotator]
    X --> Y[Annotates Markable Salience]
    Y --> Z[MentionClusterCoreferenceAnnotator]

Commands and parameters to create a plaintext document processing pipeline with Sections, paragraphs and lists

(Sectioned Fast Pipeline)

graph TD;
    A[Annotate sections by known regex] --> B[Sentence Detector]
    B --> C[Paragraph Annotator]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
    G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    H --> I[Annotate Parts of Speech]
    I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]

Pipeline with section, paragraph and list detection, degree-of and location-of relations and coreferences

(Sectioned Relation Coref Pipeline)

graph TD;
    A[Annotate sections by known regex] --> B[Sentence Detector]
    B --> C[Paragraph Annotator]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
    G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    H --> I[Annotate Parts of Speech]
    I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Annotates Degree Of relations]
    P --> Q[Annotates Location Of relations]
    Q --> W[Adds Terminal Treebank Nodes, necessary for Coreference Markables]
    W --> X[Deterministic Markable Annotator]
    X --> Y[Annotates Markable Salience]
    Y --> Z[MentionClusterCoreferenceAnnotator]

Clinical Pipeline with section, paragraph and list detection and degree-of and location-of relations

(Section Relation Pipeline)

graph TD;
    A[Annotate sections by known regex] --> B[Sentence Detector]
    B --> C[Paragraph Annotator]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
    G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    H --> I[Annotate Parts of Speech]
    I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Annotates Degree Of relations]
    P --> Q[Annotates Location Of relations]

Clinical Pipeline with sections, paragraphs, lists, degree-of, location-of, events, times, temporal and event-doc creation time relations

(Section Relation Temporal Pipeline)

graph TD;
    A[Annotate sections by known regex] --> B[Sentence Detector]
    B --> C[Paragraph Annotator]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
    G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    H --> I[Annotate Parts of Speech]
    I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Annotates Degree Of relations]
    P --> Q[Annotates Location Of relations]
    Q --> R[Annotates Temporal Events]
    R --> S[Annotates absolute time / date Temporal expressions]
    S --> T[Annotates event relativity to document creation time]
    T --> U[Creates Event - Time TLinks]
    U --> V[Creates Event - Event TLinks]

Pipeline with section, paragraph and list detection, events, times, temporal relations and document creation time relations

(Sectioned Temporal Coref Pipeline)

graph TD;
    A[Annotate sections by known regex] --> B[Sentence Detector]
    B --> C[Paragraph Annotator]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
    G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    H --> I[Annotate Parts of Speech]
    I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> O[Annotates Modifiers and Chunks]
    O --> P[Annotates Degree Of relations]
    P --> Q[Annotates Location Of relations]
    Q --> R[Annotates Temporal Events]
    R --> S[Annotates absolute time / date Temporal expressions]
    S --> T[Annotates event relativity to document creation time]
    T --> U[Creates Event - Time TLinks]
    U --> V[Creates Event - Event TLinks]

Clinical Pipeline with sections, paragraphs, lists, events, times, temporal and event-doc creation time relations

(Sectioned Temporal Pipeline)

graph TD;
    A[Annotate sections by known regex] --> B[Sentence Detector]
    B --> C[Paragraph Annotator]
    C --> D[Fix sentences so that no sentence spans across two or more paragraphs]
    D --> E[Use regular expressions created for the Pitt notes to discover formatted lists and tables]
    E --> F[Fix sentences so that no sentence spans across two or more list entries]
    F --> G[Now we can finally tokenize, tag parts of speech and chunk using adjusted sentences]
    G --> H[Finds tokens based upon context. Time, Date, Roman numeral, Fraction, Range, Measurement, Person title]
    H --> I[Annotate Parts of Speech]
    I --> L[Annotator that generates chunks of any kind as specified by the chunker model and the chunk creator]
    L --> M[Default fast dictionary lookup]
    M --> N[Adds Semantic Roles Relations]
    N --> R[Annotates Temporal Events]
    R --> S[Annotates absolute time / date Temporal expressions]
    S --> T[Annotates event relativity to document creation time]
    T --> U[Creates Event - Time TLinks]
    U --> V[Creates Event - Event TLinks]