Getting Started with PBJ - apache/ctakes GitHub Wiki

Prerequisites:

  • Apache cTAKES
  • Python virtual environment
  • Apache Artemis broker

Instructions For making an Artemis Broker:

  1. Download Apache ActiveMQ Artemis here: https://activemq.apache.org/components/artemis/download/
  2. In a terminal, navigate to the apache-artemis folder
  3. Navigate into bin, then use these commands ./bin/artemis create [name of your broker] for Mac and Linux or bin\artemis create [name of your broker] for Windows
  • You will be prompted to create a username and password, as well as --allow-anonymous, Press Y for this option

You should now have an Artemis broker and can now run PBJ

For more information see apache-artemis documentation.

Run Your First Example

For this example we will use the Piper File Submitter. You can also run the example using a shell script with a cTAKES installation if you prefer.

  1. Start the Piper File Submitter.
  2. Load the piper file in the ctakes-examples module called PbjSentencePrinter.
  • The file path is [ctakes-examples/src/user/resources/org/apache/ctakes/examples/pipeline/PbjSentencePrinter.piper
    • The GUI should have loaded the piper file and appear as below.
  1. In the parameter table, set parameter values for your system.
Parameter Name Option Value
InputDirectory -i Location for input.
OutputDirectory -o Location for output.
ArtemisBroker -a Location of your Artemis Broker.
VirtualEnv -v Location of your Python virtual environment.
PipPbj --pipPbj run python pip on PBJ.

An example from my system is below.
The value for input directory is set to the example notes distributed with cTAKES:
ctakes-examples/src/user/resources/org/apache/ctakes/examples/notes/annotated/
The first time you run PBJ you should use 'yes' to pip its code and obtain required libraries.

  1. Click the RUN button.
  • You should see run output similar to that in the image below.
  • When the run is complete you should see output similar to that in the image below.

NOTE Any PBJ pipeline is composed of 2 or more sub-pipelines. For the PBJ Sentence Printer there are 3.
What you see in the GUI is only the output of the first sub-pipeline.
The first sub-pipeline may finish long before the second and third complete, so do not be surprised if a Python process or second cTAKES (java) process is still running. The run time required by these processes is dependent upon their complexity.
You can check the progress of sub-pipelines (and therefore the entire PBJ pipeline) by inspecting the output for completion.

PBJ Output

Log Files

Any PBJ run will create 4 log files in your specified output directory.

ctakes_artemis_start.log will contain run information from the Apache Artemis broker.
ctakes_artemis_stop.log may contain additional run information from the broker, but is normally empty.
ctakes_PbjThirdStep.log contains output from the cTAKES pipeline that ran after the Python pipeline.
sentence_printer_pipeline.log contains output from the Python pipeline.

sentence_printer_pipeline.log should have contents similar to:

ctakes_PbjThirdStep.log should have contents similar to:

Per-Document Output Files

The PBJ Sentence Printer example places per-document output in three directories:
html/
table/
text/

As the Python pipeline of the PbjSentencePrinter example only detects sentences, the best output files for inspection are those in the text/ directory.
Your directory should have contents similar to:

An text file should have contents similar to:

Known Issues

  • On some systems cTAKES cannot properly start an Artemis broker. You must start one manually.
  1. Open a terminal and navigate to your Artemis installation.
  2. Execute the command bin/artemis run.
  • You may see messages at the end of a PBJ run indicating that there were problems stopping the Artemis broker. Normally these can be ignored, but check your system for running instances of a broker. If all sub-pipelines of your PBJ pipeline have finished you can manually stop the broker.
⚠️ **GitHub.com Fallback** ⚠️