Piper+Files - apache/ctakes GitHub Wiki
Create custom pipelines to extract more information than is available through the Default Clinical Pipeline. Special Analysis Engines are in various cTAKES modules. Analysis Engines can be removed or added to pipelines to obtain desired results.
There are four methods available to create custom pipelines.
-
XML Descriptor files are the original method used to create pipelines in Apache UIMA™. Though self-descriptive, they are verbose and error prone.
-
uimaFIT™ enables creation of pipelines through Java code. This greatly simplifies unit testing and experimentation.
-
The PipelineBuilder class in ctakes-core is a facade for uimaFIT™ factories and objects.
-
Piper files are a modern equivalent of the XML descriptor files. Piper files list basic commands and parameters in a flat format.
Command | Parameter 1 | Parameters 2-n | Description |
---|---|---|---|
package | package path | Add to known packages. Shortens load and add specifications. | |
load | Piper file path | Load external piper file. | |
set | name=value | <name=value ...> | Add global parameter values. |
cli | name=char | <name=char ...> | Add global parameter values based upon command-line character option values. |
reader | CR name | <name=value ...> | Set the collection reader for pipeline input data. |
add | AE or CC name | <name=value ...> | Add AE/CC to pipeline. |
addDescription | AE or CC name | <value ...> | Add AE/CC to pipeline using its .createAnnotatorDescription method. |
addLogged | AE or CC name | <name=value ...> | Add AE/CC to pipeline with Start/Finish logging. |
addLast | AE or CC name | <name=value ...> | Add AE/CC to the end of pipeline. Useful if the pipeline is meant to be extended. |
writeXmis | output directory | Add XMI writer to the pipeline. | |
// # ! | comment text | Line Comment. |
Table 1. Standard Piper commands. A complete runnable pipeline can be created using only add commands.
-
Create an empty text file. The standard file extension for piper files is
.piper
-
Set a reader for your pipeline. To set values to parameters used by the reader class, simply add one or more
name=value
pairs after the class name. -
add annotation engines and cas consumers to your pipeline. To set values to parameters used by the annotation engine class, simply add one or more
name=value
pairs after the class name. -
load common groups of components from another piper file. See Table 2 for piper files in cTAKES.
-
reader, load and add* commands all take class names or file directories as their first parameter.
If the class is not in a standard cTAKES module's cr__ae or cc package, or a piper file is not in a standard module's pipeline/ directory then the package or path must be specified for that component / file. -
Use package to simplify adding multiple pipeline components from a package not standard to cTAKES.
-
Use set to assign a value to a parameter used by following components.
*A
name=value
pair on a component line will, for that component, override a set parameter value. -
cli is a special type of set that sets a parameter to some value entered by the User on a command line.
* cli can only be used with the PiperFileRunner class, the bin/runPiperFile script or the Piper File Submitter GUI.
* Reserved parameters unavailable for cli are listed in Table 3. -
addDescription is a special type of add that utilizes a component's static
addDescription(..)
method.* Use with care as not all components have such a method.
-
Use addLogged to ensure a component's start and finish time are logged. This is useful for debugging and profiling some components.
-
Use addLast to ensure that a component, such as a writer, executes at the end of a pipeline. Multiple components can be added with addLast.
* writeXmis is a convenience command. "**writeXmis** my/output
" is equivalent to "**add** FileTreeXmiWriter OutputDirectory=my/output
". -
name=value
pairs can accept comma-delimited arrays:ArrayParm=this,is,an,array
* Texts enclosed in quotes are not arrays:NotArrayParm="this,is,just,text"
-
To run a piper file from the command line, execute the script
bin/runPiperFile -p path/to/piper
-
To run a piper from code use the
main(..)
method ofPiperFileRunner
in ctakes-core, or more directly use thePiperFileReader
class in ctakes-core. -
There are examples of piper file use in the ctakes-examples module.
-
A piper file can also be loaded and run by the Simple Pipeline Fabricator GUI and the Piper File Submitter GUI.
This wiki contains a list of standard piper files distributed with cTAKES.
Diagram 1. Piper files used in the cTAKES default Clinical Pipeline. Upper left is DefaultFastPipeline.piper
cli | Equivalent Parameter Name | Description |
---|---|---|
-p | Piper | Location of a Piper file. |
-i | InputDirectory | Directory for all input files. |
-o | OutputDirectory | Directory for all output files. |
-s | SubDirectory | Subdirectory for files. |
-l | LookupXml | Path to fast dictionary lookup xml. |
--key | umlsKey | UMLS user key. |
Table 3. Standard cli characters and their corresponding parameter names.