Pipeline List - apache/ctakes GitHub Wiki
Several piper files are included in cTAKES that are capable of running typical pipelines,
including the Default Clinical Pipeline.
Create your own piper file to extend these pipelines or build custom pipelines.
There is a flow diagram of the basic pipelines that may also be helpful.
The listed piper files are divided amongst the cTAKES Modules based upon relevancy.
They also all have the filename extension ".piper".
When running piper files included with cTAKES, specifying the module/directory and the .piper extension is not normally required.
Run pipelines defined in piper files using the Piper File Submitter GUI,
the PiperFileRunner Java class,
or on a command line with the runPiperFile script (Binary Build Only).
Basic
Partial
Basic With a Sectionizer
Basic pipelines do not detect or normalize Sections or Lists and their piper files.
| Name / Type | Piper File | Description |
|---|---|---|
| Default Clinical | DefaultFastPipeline |
A plaintext document processing pipeline with UMLS entity lookup and entity assertion attributes. |
| Default Relation | DefaultRelationPipeline |
Default Clinical Pipeline with degree-of and location-of relations. |
| Default Temporal | DefaultTemporalPipeline |
Default Clinical Pipeline with events, times, event-event and event-time relations plus event-document creation time relations. |
| Default Coref | DefaultCorefPipeline |
Default Clinical Pipeline with coreference resolution. |
| Default Relation Temporal | DefaultRelationTemporalPipeline |
Default Clinical Pipeline with degree-of, location-of, events, times, temporal and event-doc creation time relations. |
| Default Relation Coref | DefaultRelationCorefPipeline |
Default Clinical Pipeline with degree-of and location-of relations and coreference resolution. |
| Default Temporal Coref | DefaultTemporalCorefPipeline |
Default Clinical Pipeline with events, times, temporal relations, document creation time relations and coreferences. |
| Default Full | DefaultAdvancedPipeline |
Default Clinical Pipeline with degree-of and location-of relations, events, times, temporal relations, document creation time relations and coreferences. |
| Default Tokenizer | DefaultTokenizerPipeline |
A small tokenization pipeline. |
| Cnlpt Negation | CnlptNegation |
Example piper file that spins up a complete pbj pipeline. |
Piper files containing partial pipelines that cannot run alone, but are intended to be included in full pipelines using the piper load command.
| Name / Type | Piper File | Description |
|---|---|---|
| Dictionary Sub-pipe | DictionarySubPipe |
A dictionary lookup sub-pipeline. |
| Assertion Sub-pipe | AssertionSubPipe |
A default entity attributes processing sub-pipeline. |
| Attribute Cleartk Sub-pipe | AttributeCleartkSubPipe |
A default entity attributes processing sub-pipeline. |
| Windowed Attribute Cleartk Sub-pipe | WindowedAttributeCleartkSubPipe |
A default entity attributes processing sub-pipeline for large files. This is not a full pipeline. |
| Ne Contexts Sub-pipe | NeContextsSubPipe |
Partial pipeline to add assertion (e.g. negation, uncertainty) attributes based upon context. |
| Chunker Sub-pipe | ChunkerSubPipe |
A default chunker processing sub-pipeline. |
| Relation Sub-pipe | RelationSuPipe |
A default relation extraction sub-pipeline. |
| Temporal Sub-pipe | TemporalSubPipe |
A temporal processing sub-pipeline. |
| Coref Sub-pipe | CorefSubPipe |
A default coreference processing sub-pipeline. |
| Pbj Starter | PbjStarter |
Performs initial steps required for running a ctakes-pbj pipeline. |
| Pbj Stopper | PbjStopper |
Performs final steps required for stopping a ctakes-pbj pipeline. |
Piper files containing pipelines that detect Sections and Lists.
| Name / Type | Piper File | Description |
|---|---|---|
| Sectionizing Clinical | SectionedFastPipeline |
A plaintext document processing pipeline with Sections, paragraphs and lists, UMLS entity lookup and entity assertion attributes. |
| Sectioned Relation | SectionedRelationPipeline |
Sectionizing Clinical Pipeline with degree-of and location-of relations |
| Sectioned Temporal | SectionedTemporalPipeline |
Sectionizing Clinical Pipeline with events, times, temporal and event-doc creation time relations. |
| Sectioned Coref | SectionedCorefPipeline |
Sectionizing Clinical Pipeline with coreference resolution. |
| Sectioned Relation Temporal | DefaultRelationTemporalPipeline |
Sectionizing Clinical Pipeline with degree-of, location-of, events, times, temporal and event-doc creation time relations. |
| Sectioned Relation Coref | SectionedRelationCorefPipeline |
Sectionizing Clinical Pipeline with degree-of and location-of relations and coreferences. |
| Sectioned Temporal Coref | SectionedTemporalCorefPipeline |
Sectionizing Clinical Pipeline with events, times, temporal relations and document creation time relations. |
| Sectioned Full | SectionedAdvancedPipeline |
Sectionizing Clinical Pipeline with degree-of and location-of relations ... |
| Sectioned Tokenizer | FullTokenizerPipeline |
A small tokenization pipeline with sections, paragraphs and lists. |
There are equivalent versions of several included pipelines that can be run in concurrent threads.
Running these pipelines on a powerful machine with multiple processors may be faster than running a single-threaded pipeline,
but in tests the speed has not increased with the number of threads in a linear fashion.
In other words, running with two threads is not twice the speed of running with a single thread.
The multi-threading capability of cTAKES is added for simplification, not optimization.
Many techniques have been used to run documents through cTAKES more quickly.
Some have used REST services,
external frameworks such as Apache Spark and Apache BEAM,
and cTAKES-PBJ.
There are a few Videos in this wiki covering some high-performance configurations of cTAKES.
You can also find papers on the subject,
such as "Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale".
Depending upon your circumstances,
the easiest thing to do may be to split your corpus into different directories and run each directory in its own instance of cTAKES.
NOTE: You can find external unofficial code repositories that uses cTAKES, but they should be used with care.
The Apache cTAKES team will not provide any support for them, they are frequently out of date,
some contain erroneous information, and (in my opinion) many do not utilize the best aspects of cTAKES.
They sometimes smell like the copy-paste "how-to, "ranked list", and other web pages that are woefully prolific on the web.
If you author code related to cTAKES in which you have confidence and believe would be useful to others,
please contact the cTAKES team. It may be much more beneficial to the public to have your code in cTAKES itself,
rather than in its own repository.
You will be credited with any contribution,
and you may even be offered a chance to join the cTAKES development team and/or the management committee.
In addition, the cTAKES team can assist with future changes and enhancements.
Thread-Safe Basic
Thread-Safe Partial
Thread-Safe with a Sectionizer
Piper files containing pipelines that do not detect Sections or Lists. Thread-Safe Annotation Engines and Writers are used in these pipelines.
| Name / Type | Piper File | Description |
|---|---|---|
| Thread-Safe Clinical | TsDefaultFastPipeline |
A thread-safe plaintext document processing pipeline with UMLS lookup and entity assertion attributes. |
| Thread-Safe Relation | TsDefaultRelationPipeline |
Thread-safe Clinical Pipeline with degree-of and location-of relations |
| Thread-Safe Temporal | TsDefaultTemporalPipeline |
Thread-safe Clinical Pipeline with events, times, event-event and event-time relations plus event-document creation time relations |
| Thread-Safe Coref | TsDefaultCorefPipeline |
TThread-safe Clinical Pipeline with coreference resolution. |
| Thread-Safe Relation Temporal | TsDefaultRelationTemporalPipeline |
Thread-safe Clinical Pipeline with degree-of and location-of relations, events, times, temporal abd event-doc creation time relations. |
| Thread-Safe Relation Coref | TsDefaultRelationCorefPipeline |
Thread-safe Clinical Pipeline with degree-of and location-of relations and coreference resolution. |
| Thread-Safe Temporal Coref | TsDefaultTemporalCorefPipeline |
Thread-safe Clinical Pipeline with events, times, temporal relations, document creation time relations and coreferences. |
| Thread-Safe Full | TsDefaultAdvancedPipeline |
Thread-safe Clinical Pipeline with degree-of and location-of relations, events, times, temporal relations, document creation time relations and coreferences. |
| Thread-Safe Tokenizer | TsDefaultTokenizerPipeline |
A small thread-safe tokenization pipeline. |
Piper files containing partial pipelines that cannot run alone, but are intended to be included in full pipelines using the piper load command.
Thread-Safe Annotation Engines and Writers are used in these partial pipelines.
| Name / Type | Piper File | Description |
|---|---|---|
| Thread-Safe Dictionary Sub-pipe | TsDictionarySubPipe |
Thread-safe dictionary lookup sub-pipeline. |
| Thread-Safe Attribute Cleartk Sub-pipe | TsAttributeCleartkSubPipe |
Thread-safe default entity attributes processing sub-pipeline. |
| Thread-Safe Chunker Sub-pipe | TsChunkerSubPipe |
Thread-safe default chunker processing sub-pipeline. |
| Thread-Safe Relation Sub-pipe | TsRelationSubPipe |
Thread-safe relation extraction sub-pipeline. |
| Thread-Safe Temporal Sub-pipe | TsTemporalSubPipe |
Thread-safe temporal processing sub-pipeline. |
| Thread-Safe Coref Sub-pipe | TsCorefSubPipe |
Thread-safe coreference processing sub-pipeline. |
Piper files containing pipelines that detect Sections and Lists. Thread-Safe Annotation Engines and Writers are used in these pipelines.
| Name / Type | Piper File | Description |
|---|---|---|
| Thread-Safe Sectioned Clinical | TsSectionedFastPipeline |
A thread-safe plaintext document processing pipeline with UMLS entity lookup and entity assertion attributes, sections, paragraphs and lists. |
| Thread-Safe Sectioned Relation | TsSectionedRelationPipeline |
Thread-safe Sectioned Pipeline with degree-of and location-of relations. |
| Thread-Safe Sectioned Temporal | TsSectionedTemporalPipeline |
Thread-safe Sectioned Pipeline with events, times, temporal and event-doc creation time relations. |
| Thread-Safe Sectioned Coref | TsSectionedRelationCorefPipeline |
Thread-safe Sectioned Pipeline with coreference resolution. |
| Thread-Safe Sectioned Relation Temporal | TsSectionedRelationTemporalPipeline |
Thread-safe Sectioned Pipeline with degree-of, location-of, events, times, temporal and event-doc creation time relations. |
| Thread-Safe Sectioned Relation Coref | TsSectionedRelationCorefPipeline |
Thread-safe Sectioned Pipeline with degree-of and location-of relations and coreferences. |
| Thread-Safe Sectioned Temporal Coref | TsSectionedTemporalCorefPipeline |
Thread-safe Sectioned Pipeline with events, times, temporal relations and document creation time relations and coreferences. |
| Thread-Safe Sectioned Full | TsSectionedAdvancedPipeline |
Thread-safe Sectioned Pipeline with all of the above. |
| Thread-Safe Tokenizer | TsFullTokenizerPipeline |
A small thread-safe tokenization pipeline with sections, paragraphs and lists. |