Configuration File - GateNLP/gateplugin-ModularPipelines GitHub Wiki

Configuration File

The configuration file is a YAML file which must have the follogin format:

The file must contain a list of zero or more settings. Each setting consists of several key/value assignments. Depending on what is set, there are different keys that must be included in the setting. In addition to the list elements and each key/value pair in each list element there may be any number of comment lines (starting with a point character #) or empty lines. Each setting has the following generic format:

- set: <what>
  <key1>: <value1>
  ...

where <what> specifies what should get changed, possible values are prparm (PR runtime parameter), docfeature (document feature), propset (java property), prrun (pr running enabled flag), prinit (pr init parameter), inheritconfig (use same config for sub-pipelines, except for init parameters). Depending on the <what> value, there will be different <key>s that need to be set. See below for details.

Note: YAML supports the conversion of values in the configuration file to Java objects. Therefore it can make a difference if you specify 12 or "12" or true or "true" for a parameter.

NOTE: in addition to the configuration file, the Parametrized Corpus Controller can also be parametrized using system properties of a specific format. All system properties that should affect the parametrized corpus controller must start with a prefix that can be set using the property "at.ofai.gate.modularpipelines.propertyPrefix". The default prefix is "modularpipelines.". The prefix must be followed by the name of what to set, e.g. "prparm." and additional qualifying names, e.g. the controller name, processing resource name and parameter name for "prparm.". By default, the additional qualifying names are also separated by a dot, but the separator can be changed by setting the property "at.ofai.gate.modularpipelines.separator". The sections below describe the format of the property names for each kind of parametrization.

The config file itself can be set using property at.ofai.gate.modularpipelines.configFile (the value should be the path to the file, not a URL)

PR Runtime Parameter

- set: prparm
  controller: <controllerName>
  prname: <processingResourceName>
  name: <runtimeParameterName>
  value: <runtimeParameterValue>

Runtime parameters are set everytime the execute() method for a pipeline is invoked. This is once for the whole corpus for the outermost pipeline, but once for each document for sub-pipelines.

The equivalent format of the property name is "modularpipelines.prparm.<controllerName>.<processingResourceName>.<runtimeParameterName>"

PR Init Parameter

- set: prinit
  controller: <controllerName>
  prname: <processingResourceName>
  name: <initParameterName>
  value: <initParameterValue>

This will only work when a parametrizable pipeline is loaded and a config file is specified for that pipeline. Init parameters will not get changed if the config file is changed for a loaded pipeline or if the inheritconfig setting is used to change the config file of sub-pipelines (this happens after the sub-pipelines are loaded, so too late to change the init parameters).

NOTE: this can be used to also set the init parameters of LRs and other resources needed by the pipeline (so "prname" is not the best name here).

Init parameters are set during the de-serialization of a pipeline file.

The equivalent format of the property name is "modularpipelines.prinit.<controllerName>.<processingResourceName>.<initParameterName>"

Document Feature

- set: docfeature
  name: <documentFeatureName>
  override: true/false
  value: <documentFeatureValue>

The override parameter is optional, default setting is true. If set to false, the feature will not be overriden by the pipeline if it is already set to some value.

Document features are set right before the first component in a pipeline is run for a document, both in the main pipeline and in sub-pipelines.

The equivalent format of the property name is "modularpipelines.docfeature.<documentFeatureName>" if the feature can be overridden and "modularpipelines.docfeature.<udocumentFeatureName>" if it should not get overridden.

Java property

- set: propset
  name: <propertyName>
  value: <propertyValue>

Java properties are set when a config file is loaded (either at the end of initialization of the pipeline or when the config file URL is re-set). Note that re-setting the config file or clearing the config file will not clear any of the java properties which have been set earlier, to do this a config file must be loaded which explicitly clears the properties.

Obviously. there is no equivalent property format for this since the property can be set directly anyway.

PR Running Mode Flag

This can be used to set the running mode for a PR to true (always run) or false (never run).

- set: prrun
  controller: <controllerName>
  prname: <processingResourceName>
  value: true|false

Run modes are set at the same time as runtime parameters are set: everytime the execute() method for a pipeline is invoked. This is once for the whole corpus for the outermost pipeline, but once for each document for sub-pipelines.

The equivalent format of the property name is "modularpipelines.prrun.<controllerName>.<processingResourceName>"

Force Config File for Sub-Pipelines

- set: inheritconfig

If this setting occurs in a config file, an attempt will be made to use the same config file for all sub-pipelines and their sub-pipelines, overriding whatever config file may have been originally set for them. This will have no effect on any init parameters and it will only work if all direct and indirect sub-pipelines are parametrizable pipelines. If a parametrizable pipeline has a sub-pipeline which is some other kind of controller and this in turn contains a parametrizable pipeline, this nested pipeline will not be affected by the inheritconfig setting.

Examples

To set the runtime parameter "encoding" of the Processing Resource with the name "Tokenizer" in the controller with the name "main" to the value "UTF-8", the following entry in a config file could be used:

- set: prparm
  controller: main
  prname: Tokenizer
  name: encoding
  value: UTF-8

To accomplish the same by specifying the property settings e.g. when running the gate.sh command:

gate.sh -Dmodularpipelines.prparm.main.Tokenizer.encoding=UTF-8
⚠️ **GitHub.com Fallback** ⚠️