Tagger_GoogleNLP - GateNLP/gateplugin-Tagger_GoogleNLP GitHub Wiki

Tagger_GoogleNLP Processing Resource

This PR connects to the Google Service URL, sends the text of your document to the Google server for annotation and receives the annotation results which are used to create GATE annotations in your GATE document.

To use the Google service, you need a Google service account and make sure that the key is authorised to use the NLP API and that the NLP API is enabled. This may require to have an active billing account.

Once the service account is created, download the JSON key file for it, this should be a file with a name of the form -.json. The location of this file needs to be specified as a runtime parameter or by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to its location prior to starting GATE.

Note that this PR only needs the text of the documents, so no preprocessing steps are required. If no containingAnnotationType is specified the text for the whole document is annotated. If the containingAnnotationType is specfified, then only text covered by annotations of that type from the input annotation set is annotated, and the service is invoked for the text covered by each annotation separately. This makes it possible to e.g. annotate the sentiment of arbitrary portions of the text.

Runtime Parameters

  • annotateEntities (Boolean, default true) whether or not named entities should get annotated. For entities, annotations of the type "Entity" are placed into the output annotaiton set.
  • annotateSentiment (Boolean, default true) whether or not the sentiment or the document or the spans covered by the containing annotations should be annotated, if yes, annotations of the type "Sentiment" are placed into the output annotation set.
  • annotateSyntax (Boolean, default false) whether or not to annote tokens and dependency edges. If this is true, annotations of the type "Token" and "Sentence" are placed in the output annotation set.
  • applicationName (String, default "GateTaggerGoogleNLP") the aplication name to send to the Google service.
  • containingAnnotationType (String, no default) if this is specified, then the service is invoked separately for the text covered by each of the annotations of this type in the input annotation set. If a document does not contain any such annotation, nothing is done for that document.
  • inputAnnotationSet (String, default is empty, indiciating the default annotation set) This is only relevant if the containingAnnotaitonType is set, in that case it indicates the annotation set for the containing annotations.
  • keyFileUrl (URL, no default) the location of the JSON service account key downloaded from the Google cloud console. This is a json file that should contain the fields private_key and project_id, among others, and the field type with the value service_account. If this is not specified, the plugin expects the environment variable GOOGLE_APPLICATION_CREDENTIALS to be set to the location of that file.
  • languageCode (String, default empty) The language code for the language of the text to be processed. As of this writing Google supports the languages English (language code "en"), Spanish (code "es") and Japanese (code "jp")
  • outputAnnotationSet (String, default "GoogleNLP") the name of the annotation set where the gnerated annotations are placed

Generated Annotations

The following annotations and features are generated, depending on which of the three flags annotateEntities, annotateSentiment, and annotateSyntax are set to true: