Plan Providers - nsoft/jesterj GitHub Wiki

The primary means of configuring JesterJ is by building up a document processing plan. This plan is represented as a very simple Java class, designed to have one method and this method is normally organized into 4 parts

Writing a Plan Provider

There are 3 basic things to do in a plan provider.

  1. Configure the steps you will need using fluent/builder apis
  2. String your steps together by defining the predecessor(s) of each step using planBuilder.addStep()
  3. Call planBuilder.build() and return the result

You can see an example of this here:

ShakespeareConfig.java

Notice that the key method is organized as follows:

  1. Step builder instantiations
  2. Step builder configuration, including addition of builders for Processors at each step.
  3. Addition of steps to the plan with specification of predecessor steps

This organization is recommended, but not required. The organization of this method may be varied in any fashion that is valid java syntax.

Naming

All builders support a .named(String) method and a unique textual name should be provided for each. These names are required to match the regular expression ^[A-Za-z][\w.]*$ (must start with a letter and thereafter contain only letters/numbers/dashes/underscores/dots note that spaces are NOT allowed)

Structure

When Plan.build() is called the structure of the plan is created, and checked for cycles (any path that can lead back to the same node). Cycles (loops) are not allowed, and the build should fail if they are found. ANY other legal Directed Acyclic Graph is allowed so long as

  1. All tree roots (starting points) begin with implementations of Scanner
  2. All leaves (final steps) are configured with potent or idempotent processors

Note that it is fully legal and supported to have several sources feeding completely disconnected paths in the same plan.

Visualization

Sometimes it is difficult to trouble shoot the construction of your DAG so JesterJ has a built in way to visualize the structure. This allows you to verify that the structure you think you specified is actually the structure you did specify. If you run java -jar jesterj-node-1.0-beta2.jar -z viz.png example-shakespeare-1.0-beta2.jar NODENAME NODEPASS This produces a png visualization of the plan that looks like this:

image

Plan Design Considerations

As of 1.0 plan performance is significantly impacted by the number of scanners and the number of steps that must be tracked as output destinations. (Potent or Idempotent steps). So for example, trying to increase output by duplicating sender steps may actually decrease performance. The following diagram with batch size of 1000 was about 3x slower than a single solr_sender step with batch size of 5000. image

Plans in other JVM based languages

In theory this should be possible, but you must compile the plan to byte code, assemble it into a .jar file and the the byte code must result in a class that has the @JavaPlanConfig annotation. Assuming that these requirements can be satisfied, it is theoretically possible to write a plan in:

  • Jython
  • JRuby
  • Scala
  • Kotlin
  • Groovy

If you attempt this please share the results in the General Section of Discussions. At some point in the future we would like to have the ability to dynamically consume and interpret some of these languages so contributions welcome there too!