Simple Pipeline Fabricator - apache/ctakes GitHub Wiki
Creating a custom pipeline in cTAKES can be a daunting task. One of the most challenging elements is simply knowing what Pipe Bits and pieces are available to create a pipeline, as well as what data types those Pipe Bits depend upon and produce. The Simple Pipeline Fabricator GUI can help provide this information.
The Simple Pipeline Fabricator can save your pipeline in a reusable Piper File, as well as load and run Piper Files. It only assists creating cTAKES pipelines, displaying information and allowing selection of available Readers, Annotators and Writers (Pipe Bits). It also displays information and facilitates assigning values to Pipe Bit parameters. However, creating a proper working pipeline requires knowledge of basic NLP artifacts and dataflow.
Before or while using the Simple Pipeline Fabricator GUI, reference the information on Piper Files
- From a command-line in the cTAKES root directory, execute:
bin/runPiperCreator
- Click Scan to allow the tool to discover all pipeline components (Pipe Bits) available to cTAKES. The scan may take several seconds.
- Select a Pipe Bit on the left. Information on the Pipe Bit will be displayed in the center pane.
- Select a Parameter Name in the center table. Information on the Parameter will be displayed below the table.
- Enter a Value in the table for any parameter that requires one. An 'open file' button makes file selection easy.
- Click Add when all parameter requirements are met. The Pipe Bit will be added to the piper file on the right.
- Continue to Add pipe bits in order to create your pipeline.
- Click Validate at any time. A valid piper will activate the run button.
- Click Save to save your piper file for future use.
- Click Run to run your pipeline within the GUI. If your pipeline is complex or you have a large amount of input/output, running with a
bin/runPiperFile
script is recommended.
There are four basic types of pipe bit: Reader, Annotator, Writer, and Utility
Pipe Bits in the Available Pipe Bit list have icons indicating dependency data types and product datatypes. Icons to the left of the pipe bit name are dependencies. Icons to the right are products.
The Available Pipe Bit list sorts pipe bits by complexity of datatype dependencies. Bits in your pipeline should follow the same general order.
Older (pre-uimaFIT™) Pipe Bits will not display parameter information. Use with care.
Unrecognized pipe bits without @PipeBitInfo Java annotations will have a grey background in the Available Pipe Bit list. Use with care.
The Piper File editor on the right has some simple text coloring. If a command is invalid it will be displayed in underlined red.
Lines added by the tool are verbose. You can manually edit or rearrange them to improve the piper file. Click Validate to confirm a good edit.
Occasionally Validate will "time out" - there will be a message in the bottom log panel. Click Validate again.
General datatypes and their associated icons in the Available Pipe Bit list