Gate Plugin - GateNLP/cloud-client GitHub Wiki

This plugin provides a PR that can call a GATE Cloud service as part of a pipeline in GATE Developer or Embedded, passing the text of the document to the selected service and merging the returned annotations back into the document.

Getting the plugin

For GATE 8.5 and later, release versions of the plugin are published to the Central Repository in the normal way, and can be loaded into GATE using the group ID uk.ac.gate.plugins, artifact ID gate-cloud-plugin and the appropriate version number (the latest release is 1.1). For the latest development code, run mvn install in the root of the GATE Cloud client source tree, which will build snapshot versions of the client library and plugin and install them into your local Maven repository, from where GATE can load them as normal.

In GATE 8.4.1 and earlier, use the "additional plugins from the GATE team" plugin repository. Set a user plugin directory and enable this repository via the "configuration" tab in the plugin manager, then select the "GateCloudClient" plugin from the "available to install" tab.

Using the GATE Cloud client PR

With the plugin loaded, you can create an instance of the GATE Cloud Client PR in the normal way via the "new processing resource" menu. The PR has three init parameters:

  • endpointUrl (required) - the URL of the GATE Cloud service you want to call. This can be found in the GATE Cloud shop, under "use this pipeline".
  • apiKey and apiPassword - your personal API key ID and the corresponding password. API keys can be generated from your account page on https://cloud.gate.ac.uk. Note that an API key is not strictly required when calling public services, but it is strongly recommended as unauthenticated access is subject to much lower quotas and rate limits than access with an API key. If you will be processing more than a handful of documents then you should register for a free account and generate an API key.

The PR also has a number of runtime parameters. These can be set in the normal way when adding the PR to a pipeline, but it is much more convenient to use the service configuration editor available by double-clicking on the PR in the left hand tree.

GATE Cloud service configuration for ANNIE

Click the "fetch service metadata" button to retrieve metadata from the service endpoint describing the available annotation types and the annotation sets in which they will be created. This will create (a) a set of checkboxes where you can select which annotations you are interested in and (b) a table where you can map each output annotation set name from the service to a name in the document. By default each annotation set name is mapped to itself, i.e. if the service creates annotations in its default annotation set then those annotations will end up in the default annotation set of your document in GATE.

The other runtime parameters accepted by the PR are:

  • minDelay - the minimum number of milliseconds to wait between calls to the service. The default is 501, which keeps the PR within the default rate limit of two calls per second on average. Only change this if you have been told to do so by GATE Cloud support.
  • sendOnlyText - if true (the default), only the document text is sent to the service. If false, the entire document is serialized as GATE XML, including document features and any existing annotations. This option is occasionally useful if you are calling services that expect certain metadata in the Original markups annotation set, e.g. Twitter-related services that make use of the metadata parsed from JSON by GATE's tweet document format.