Getting Started with PBJ - apache/ctakes GitHub Wiki
- Apache cTAKES
- Python virtual environment
- Apache Artemis broker
- Download Apache ActiveMQ Artemis here: https://activemq.apache.org/components/artemis/download/
- In a terminal, navigate to the apache-artemis folder
- Navigate into bin, then use these commands
./bin/artemis create [name of your broker]
for Mac and Linux orbin\artemis create [name of your broker]
for Windows
- You will be prompted to create a username and password, as well as
--allow-anonymous
, PressY
for this option
You should now have an Artemis broker and can now run PBJ
For more information see apache-artemis documentation.
For this example we will use the Piper File Submitter. You can also run the example using a shell script with a cTAKES installation if you prefer.
- Start the Piper File Submitter.
- Load the piper file in the ctakes-examples module called PbjSentencePrinter.
- The file path is
[ctakes-examples/src/user/resources/org/apache/ctakes/examples/pipeline/PbjSentencePrinter.piper
- The GUI should have loaded the piper file and appear as below.
- The GUI should have loaded the piper file and appear as below.
- In the parameter table, set parameter values for your system.
Parameter Name | Option | Value |
---|---|---|
InputDirectory |
-i |
Location for input. |
OutputDirectory |
-o |
Location for output. |
ArtemisBroker |
-a |
Location of your Artemis Broker. |
VirtualEnv |
-v |
Location of your Python virtual environment. |
PipPbj |
--pipPbj |
run python pip on PBJ. |
An example from my system is below.
The value for input directory is set to the example notes distributed with cTAKES:
ctakes-examples/src/user/resources/org/apache/ctakes/examples/notes/annotated/
The first time you run PBJ you should use 'yes' to pip its code and obtain required libraries.
- Click the RUN button.
- You should see run output similar to that in the image below.
- When the run is complete you should see output similar to that in the image below.
NOTE
Any PBJ pipeline is composed of 2 or more sub-pipelines. For the PBJ Sentence Printer there are 3.
What you see in the GUI is only the output of the first sub-pipeline.
The first sub-pipeline may finish long before the second and third complete,
so do not be surprised if a Python process or second cTAKES (java) process is still running.
The run time required by these processes is dependent upon their complexity.
You can check the progress of sub-pipelines (and therefore the entire PBJ pipeline) by inspecting the output for completion.
Any PBJ run will create 4 log files in your specified output directory.
ctakes_artemis_start.log
will contain run information from the Apache Artemis broker.
ctakes_artemis_stop.log
may contain additional run information from the broker, but is normally empty.
ctakes_PbjThirdStep.log
contains output from the cTAKES pipeline that ran after the Python pipeline.
sentence_printer_pipeline.log
contains output from the Python pipeline.
sentence_printer_pipeline.log
should have contents similar to:
ctakes_PbjThirdStep.log
should have contents similar to:
The PBJ Sentence Printer example places per-document output in three directories:
html/
table/
text/
As the Python pipeline of the PbjSentencePrinter example only detects sentences,
the best output files for inspection are those in the text/
directory.
Your directory should have contents similar to:
An text file should have contents similar to:
- On some systems cTAKES cannot properly start an Artemis broker. You must start one manually.
- Open a terminal and navigate to your Artemis installation.
- Execute the command
bin/artemis run
.
- You may see messages at the end of a PBJ run indicating that there were problems stopping the Artemis broker. Normally these can be ignored, but check your system for running instances of a broker. If all sub-pipelines of your PBJ pipeline have finished you can manually stop the broker.