Tagger_SyntaxNet - GateNLP/gateplugin-Tagger_SyntaxNet GitHub Wiki

Processing Resource Tagger_SyntaxNet

Runtime parameters

  • containingAnnotationType (String, no default): If this is pecified, then annotations of this type and from the input annotation set are used for identifying those spans in the document which should get annotated. The PR will create and exchange one request for each span with the server. This can e.g. be used to only annotated text without the boilerplate, or only annotate text of a specific language in a mixed-language document.
  • inputAnnotationSet (String, default is empty for the default annotation set): this is only relevant if the containingAnnotationType parameter is specified in which case it is the annotation set which should contain the containing annotations.
  • outputAnnotationSet (String, default is empty for the default annotaiton set): annotation set where the new annotations will be added.
  • serverAddress (String, default is 127.0.0.1): the address/hostname of the host where the SyntaxNet server is running
  • serverPort (Integer, default is 9000): the port number the SyntaxNet server uses

Output

The PR creates the following annotation types in the output annotation set:

  • Sentence : for each Sentence found by the server
  • Token: for each Token found by the server. Note that SyntaxNet only finds token and completely ignores any white space, so unlike with other GATE tokenisers, no "SpaceToken" is created.

The Token annotaitons contain the following features (NOTE: the fields category and tag have their content switched with respect to the fields created by SyntaxNet in order to be better compatible with GATE conventins!):

  • breaklevel: the way how the preceding token is seperated from the current token. This is one of:
    • NO_BREAK
    • SPACE_BREAK
    • LINE_BREAK
    • SENTENCE_BREAK
  • category: this is the content of the field tag SyntaxNet returns for each Token and contains the language-specific POS tag
  • tag: this is the content of the field category SyntaxNet returns for each Token and contains the universal POS tag (http://universaldependencies.org/u/pos/)
  • headId: this is the annotation Id of another annotation which is the head of this annotation. For a ROOT token, this is the annotation id of the containing Sentence annotation
  • label: the label of the dependency parse arc
  • word: the original word string