Tagger_GoogleNLP - GateNLP/gateplugin-Tagger_GoogleNLP GitHub Wiki
Tagger_GoogleNLP Processing Resource
This PR connects to the Google Service URL, sends the text of your document to the Google server for annotation and receives the annotation results which are used to create GATE annotations in your GATE document.
To use the Google service, you need a Google service account and make sure that the key is authorised to use the NLP API and that the NLP API is enabled. This may require to have an active billing account.
Once the service account is created, download the JSON key file for it, this should be a file with a name of the form -.json. The location of this file needs to be specified as a runtime parameter or by setting the environment variable GOOGLE_APPLICATION_CREDENTIALS to its location prior to starting GATE.
Note that this PR only needs the text of the documents, so no preprocessing steps are required. If no containingAnnotationType is specified the text for the whole document is annotated. If the containingAnnotationType is specfified, then only text covered by annotations of that type from the input annotation set is annotated, and the service is invoked for the text covered by each annotation separately. This makes it possible to e.g. annotate the sentiment of arbitrary portions of the text.
Runtime Parameters
annotateEntities
(Boolean, default true) whether or not named entities should get annotated. For entities, annotations of the type "Entity" are placed into the output annotaiton set.annotateSentiment
(Boolean, default true) whether or not the sentiment or the document or the spans covered by the containing annotations should be annotated, if yes, annotations of the type "Sentiment" are placed into the output annotation set.annotateSyntax
(Boolean, default false) whether or not to annote tokens and dependency edges. If this is true, annotations of the type "Token" and "Sentence" are placed in the output annotation set.applicationName
(String, default "GateTaggerGoogleNLP") the aplication name to send to the Google service.containingAnnotationType
(String, no default) if this is specified, then the service is invoked separately for the text covered by each of the annotations of this type in the input annotation set. If a document does not contain any such annotation, nothing is done for that document.inputAnnotationSet
(String, default is empty, indiciating the default annotation set) This is only relevant if thecontainingAnnotaitonType
is set, in that case it indicates the annotation set for the containing annotations.keyFileUrl
(URL, no default) the location of the JSON service account key downloaded from the Google cloud console. This is a json file that should contain the fieldsprivate_key
andproject_id
, among others, and the fieldtype
with the valueservice_account
. If this is not specified, the plugin expects the environment variableGOOGLE_APPLICATION_CREDENTIALS
to be set to the location of that file.languageCode
(String, default empty) The language code for the language of the text to be processed. As of this writing Google supports the languages English (language code "en"), Spanish (code "es") and Japanese (code "jp")outputAnnotationSet
(String, default "GoogleNLP") the name of the annotation set where the gnerated annotations are placed
Generated Annotations
The following annotations and features are generated, depending on which of the three flags annotateEntities
, annotateSentiment
, and annotateSyntax
are set to true:
Token
category
: the POS tagroot
: the lemmadepLabel
: the label of the dependency edge for this tokenheadId
: the annotation id of the Token annotaton the is the head this annotation connects to via the edge
Entity
:salience
: The salience score associated with the entity in the [0, 1.0] range. See https://cloud.google.com/natural-language/reference/rest/v1beta1/Entitytype
: one of the entity types listed here: https://cloud.google.com/natural-language/reference/rest/v1beta1/Entity#Typewikipedia_url
: the URL of the Wikipedia page for this entity
Sentence
: The sentences found, no featuresLanguage
: This is added if the languageCode parameter has been left blanklang
: the language code of the aut-detected langyuage
Sentiment
:magnitude
,polarity
: see https://cloud.google.com/natural-language/reference/rest/v1beta1/Sentiment