Pipeline List - apache/ctakes GitHub Wiki

Several piper files are included in cTAKES that are capable of running typical pipelines, including the Default Clinical Pipeline.
Create your own piper file to extend these pipelines or build custom pipelines.
There is a flow diagram of the basic pipelines that may also be helpful.
The listed piper files are divided amongst the cTAKES Modules based upon relevancy. They also all have the filename extension ".piper". When running piper files included with cTAKES, specifying the module/directory and the .piper extension is not normally required.

Run pipelines defined in piper files using the Piper File Submitter GUI, the PiperFileRunner Java class, or on a command line with the runPiperFile script (Binary Build Only).

Pipeline Types

Basic
Partial
Basic With a Sectionizer


Basic Pipelines

Basic pipelines do not detect or normalize Sections or Lists and their piper files.

Name / Type Piper File Description
Default Clinical DefaultFastPipeline A plaintext document processing pipeline with UMLS entity lookup and entity assertion attributes.
Default Relation DefaultRelationPipeline Default Clinical Pipeline with degree-of and location-of relations.
Default Temporal DefaultTemporalPipeline Default Clinical Pipeline with events, times, event-event and event-time relations plus event-document creation time relations.
Default Coref DefaultCorefPipeline Default Clinical Pipeline with coreference resolution.
Default Relation Temporal DefaultRelationTemporalPipeline Default Clinical Pipeline with degree-of, location-of, events, times, temporal and event-doc creation time relations.
Default Relation Coref DefaultRelationCorefPipeline Default Clinical Pipeline with degree-of and location-of relations and coreference resolution.
Default Temporal Coref DefaultTemporalCorefPipeline Default Clinical Pipeline with events, times, temporal relations, document creation time relations and coreferences.
Default Full DefaultAdvancedPipeline Default Clinical Pipeline with degree-of and location-of relations, events, times, temporal relations, document creation time relations and coreferences.
Default Tokenizer DefaultTokenizerPipeline A small tokenization pipeline.
Cnlpt Negation CnlptNegation Example piper file that spins up a complete pbj pipeline.

Partial Pipelines

Piper files containing partial pipelines that cannot run alone, but are intended to be included in full pipelines using the piper load command.

Name / Type Piper File Description
Dictionary Sub-pipe DictionarySubPipe A dictionary lookup sub-pipeline.
Assertion Sub-pipe AssertionSubPipe A default entity attributes processing sub-pipeline.
Attribute Cleartk Sub-pipe AttributeCleartkSubPipe A default entity attributes processing sub-pipeline.
Windowed Attribute Cleartk Sub-pipe WindowedAttributeCleartkSubPipe A default entity attributes processing sub-pipeline for large files. This is not a full pipeline.
Ne Contexts Sub-pipe NeContextsSubPipe Partial pipeline to add assertion (e.g. negation, uncertainty) attributes based upon context.
Chunker Sub-pipe ChunkerSubPipe A default chunker processing sub-pipeline.
Relation Sub-pipe RelationSuPipe A default relation extraction sub-pipeline.
Temporal Sub-pipe TemporalSubPipe A temporal processing sub-pipeline.
Coref Sub-pipe CorefSubPipe A default coreference processing sub-pipeline.
Pbj Starter PbjStarter Performs initial steps required for running a ctakes-pbj pipeline.
Pbj Stopper PbjStopper Performs final steps required for stopping a ctakes-pbj pipeline.

Pipelines with a Sectionizer

Piper files containing pipelines that detect Sections and Lists.

Name / Type Piper File Description
Sectionizing Clinical SectionedFastPipeline A plaintext document processing pipeline with Sections, paragraphs and lists, UMLS entity lookup and entity assertion attributes.
Sectioned Relation SectionedRelationPipeline Sectionizing Clinical Pipeline with degree-of and location-of relations
Sectioned Temporal SectionedTemporalPipeline Sectionizing Clinical Pipeline with events, times, temporal and event-doc creation time relations.
Sectioned Coref SectionedCorefPipeline Sectionizing Clinical Pipeline with coreference resolution.
Sectioned Relation Temporal DefaultRelationTemporalPipeline Sectionizing Clinical Pipeline with degree-of, location-of, events, times, temporal and event-doc creation time relations.
Sectioned Relation Coref SectionedRelationCorefPipeline Sectionizing Clinical Pipeline with degree-of and location-of relations and coreferences.
Sectioned Temporal Coref SectionedTemporalCorefPipeline Sectionizing Clinical Pipeline with events, times, temporal relations and document creation time relations.
Sectioned Full SectionedAdvancedPipeline Sectionizing Clinical Pipeline with degree-of and location-of relations ...
Sectioned Tokenizer FullTokenizerPipeline A small tokenization pipeline with sections, paragraphs and lists.

Thread-safe Pipelines

There are equivalent versions of several included pipelines that can be run in concurrent threads.
Running these pipelines on a powerful machine with multiple processors may be faster than running a single-threaded pipeline, but in tests the speed has not increased with the number of threads in a linear fashion. In other words, running with two threads is not twice the speed of running with a single thread. The multi-threading capability of cTAKES is added for simplification, not optimization.
Many techniques have been used to run documents through cTAKES more quickly.
Some have used REST services, external frameworks such as Apache Spark and Apache BEAM, and cTAKES-PBJ.
There are a few Videos in this wiki covering some high-performance configurations of cTAKES.
You can also find papers on the subject, such as "Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale".
Depending upon your circumstances, the easiest thing to do may be to split your corpus into different directories and run each directory in its own instance of cTAKES.

NOTE: You can find external unofficial code repositories that uses cTAKES, but they should be used with care. The Apache cTAKES team will not provide any support for them, they are frequently out of date, some contain erroneous information, and (in my opinion) many do not utilize the best aspects of cTAKES. They sometimes smell like the copy-paste "how-to, "ranked list", and other web pages that are woefully prolific on the web.
If you author code related to cTAKES in which you have confidence and believe would be useful to others, please contact the cTAKES team. It may be much more beneficial to the public to have your code in cTAKES itself, rather than in its own repository. You will be credited with any contribution, and you may even be offered a chance to join the cTAKES development team and/or the management committee. In addition, the cTAKES team can assist with future changes and enhancements.

Thread-Safe Basic
Thread-Safe Partial
Thread-Safe with a Sectionizer

Thread-Safe Basic Pipelines

Piper files containing pipelines that do not detect Sections or Lists. Thread-Safe Annotation Engines and Writers are used in these pipelines.

Name / Type Piper File Description
Thread-Safe Clinical TsDefaultFastPipeline A thread-safe plaintext document processing pipeline with UMLS lookup and entity assertion attributes.
Thread-Safe Relation TsDefaultRelationPipeline Thread-safe Clinical Pipeline with degree-of and location-of relations
Thread-Safe Temporal TsDefaultTemporalPipeline Thread-safe Clinical Pipeline with events, times, event-event and event-time relations plus event-document creation time relations
Thread-Safe Coref TsDefaultCorefPipeline TThread-safe Clinical Pipeline with coreference resolution.
Thread-Safe Relation Temporal TsDefaultRelationTemporalPipeline Thread-safe Clinical Pipeline with degree-of and location-of relations, events, times, temporal abd event-doc creation time relations.
Thread-Safe Relation Coref TsDefaultRelationCorefPipeline Thread-safe Clinical Pipeline with degree-of and location-of relations and coreference resolution.
Thread-Safe Temporal Coref TsDefaultTemporalCorefPipeline Thread-safe Clinical Pipeline with events, times, temporal relations, document creation time relations and coreferences.
Thread-Safe Full TsDefaultAdvancedPipeline Thread-safe Clinical Pipeline with degree-of and location-of relations, events, times, temporal relations, document creation time relations and coreferences.
Thread-Safe Tokenizer TsDefaultTokenizerPipeline A small thread-safe tokenization pipeline.

Thread-Safe Partial Pipelines

Piper files containing partial pipelines that cannot run alone, but are intended to be included in full pipelines using the piper load command.
Thread-Safe Annotation Engines and Writers are used in these partial pipelines.

Name / Type Piper File Description
Thread-Safe Dictionary Sub-pipe TsDictionarySubPipe Thread-safe dictionary lookup sub-pipeline.
Thread-Safe Attribute Cleartk Sub-pipe TsAttributeCleartkSubPipe Thread-safe default entity attributes processing sub-pipeline.
Thread-Safe Chunker Sub-pipe TsChunkerSubPipe Thread-safe default chunker processing sub-pipeline.
Thread-Safe Relation Sub-pipe TsRelationSubPipe Thread-safe relation extraction sub-pipeline.
Thread-Safe Temporal Sub-pipe TsTemporalSubPipe Thread-safe temporal processing sub-pipeline.
Thread-Safe Coref Sub-pipe TsCorefSubPipe Thread-safe coreference processing sub-pipeline.

Thread-Safe Pipelines with a Sectionizer

Piper files containing pipelines that detect Sections and Lists. Thread-Safe Annotation Engines and Writers are used in these pipelines.

Name / Type Piper File Description
Thread-Safe Sectioned Clinical TsSectionedFastPipeline A thread-safe plaintext document processing pipeline with UMLS entity lookup and entity assertion attributes, sections, paragraphs and lists.
Thread-Safe Sectioned Relation TsSectionedRelationPipeline Thread-safe Sectioned Pipeline with degree-of and location-of relations.
Thread-Safe Sectioned Temporal TsSectionedTemporalPipeline Thread-safe Sectioned Pipeline with events, times, temporal and event-doc creation time relations.
Thread-Safe Sectioned Coref TsSectionedRelationCorefPipeline Thread-safe Sectioned Pipeline with coreference resolution.
Thread-Safe Sectioned Relation Temporal TsSectionedRelationTemporalPipeline Thread-safe Sectioned Pipeline with degree-of, location-of, events, times, temporal and event-doc creation time relations.
Thread-Safe Sectioned Relation Coref TsSectionedRelationCorefPipeline Thread-safe Sectioned Pipeline with degree-of and location-of relations and coreferences.
Thread-Safe Sectioned Temporal Coref TsSectionedTemporalCorefPipeline Thread-safe Sectioned Pipeline with events, times, temporal relations and document creation time relations and coreferences.
Thread-Safe Sectioned Full TsSectionedAdvancedPipeline Thread-safe Sectioned Pipeline with all of the above.
Thread-Safe Tokenizer TsFullTokenizerPipeline A small thread-safe tokenization pipeline with sections, paragraphs and lists.
⚠️ **GitHub.com Fallback** ⚠️