REST API - ge-semtk/semtk GitHub Wiki

SemTK REST endpoints can be accessed via many languages or tools such as curl, Swagger and Postman. Additionally, REST client code is available via:

There are a two main types of endpoints:

  • asynchronous job execution - a sequence of calls to launch a job, wait until complete, and retrieve results. Details are below.
  • synchronous call - returns results in a single call. This is used during testing and for simpler operations which are likely to return quickly.

Queries are often run by using Nodegroup IDs, which refer to nodegroups that already been saved in SemTK.

Many services are designed to accept SPARQL connection strings, to indicate the graphs and triplestore(s) with which the application will interact. The SPARQL connection string(s) should be easily configured so they can be changed, for example, when an application moves from development to production, or when a data source changes. Use of nodegroup default connections is strongly discouraged in production environments.

Many jobs return data in the form of a SemTK table.

All calls should check for errors.

Default ports

The default ports for the most commonly used services are:

  • nodegroup execution 12058 - this is the most commonly used service
  • nodegroup store 12056 - storage of nodegroups by id
  • nodegroup service 12059 - interrogating and changing nodegroups
  • ontology info 12057 - information about the model

Lesser used services have their common endpoints available in the nodegroup execution service:

  • status 12051- job status
  • results 12052- retrieve results
  • query 12050- running queries

Each of these ports has a swagger page with a full listing of endpoints (e.g. host:12058/swagger-ui.html). The most commonly used are shown below.

Common endpoints

Execute a select query

Here is the simplest way to launch a query to select data.

curl -X POST protocol://host:12058/nodeGroupExecution/dispatchSelectById \
-H "Content-Type: application/json" \
-d '{"nodegroupId":"MyNodegroup", "sparqlConnection": "NODEGROUP_DEFAULT"}' 

The above assumes that Nodegroup ID "MyNodegroup" is already stored in SemTK, and uses the default sparqlConnection stored with that nodegroup. See here to override the sparqlConnection (recommended), limit the number of results, or provide runtime constraints to the query.

A successful response will return a JobId, which should be used as follows to wait for and retrieve results.

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "JobId": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0"
  }
}

Execute a delete query

Delete query is similar to select.

POST: host:12058/nodeGroupExecution/dispatchDeleteById
{  
 "nodeGroupId": "BLAST_GRC_ExpectedFunding",  
 "sparqlConnection": "NODEGROUP_DEFAULT"       // override connection is recommended
  runtimeConstraints: "[{"SparqlID":"?sso","Operator":"MATCHES","Operands":["200001934"]}]"   // very common optional parameter
}  

Success will generate a response just like a select query:

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "JobId": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0"
  }
}

simpleresults.JobId should be used for completing asynchronous jobs. The table returned from a successfully-completed delete query will have a @message column with a single cell containing a message from the triplestore describing the success.

Ingest CSV data

When ingesting, it is important to avoid the deprecated synchronous endpoints.  These may fail on large ingestions.

Ingestion endpoints are available on the ingestion service and the one-stop-shopping nodeGroupExecution service. There are variations based on ingestion by nodegroupId, csv strings, and multipart csv files (ingestion service only).

NodeGroupExecution service ingestion endpoints:

  • nodeGroupeExecution/ingestByCsvStringsByIdAsync - ingest using a nodegroupId and csv, both as strings
  • nodeGroupeExecution/ingestByCsvStringsAsync - ingest using a nodegroup JSON and csv, both as strings

Ingestion service endpoints:

  • ingestion/fromCsvAsync - ingest using nodegroup json and csv, both as strings
  • ingestion/fromCsvFileAsync - ingest using multi-part files

Check the swagger endpoints to confirm the following parameters on each endpoint.

Nodegroup is specified by one of these parameters:

  • template or jsonRenderedNodeGroup - nodegroup json
  • nodegroupId - nodegroup id from the store

CSV data is specified with one of:

  • data or
  • csvContent

Optional but highly recommended connection override is specified with:

  • connectionOverride or
  • sparqlConnection

Additional parameters are available on most endpoints:

  • trackFlag - boolean - use the ingestion tracking features
  • overrideBaseURI - string - when generating URI's use this base, e.g. "uri://My/Base"
  • skipPrecheck - boolean - skip the precheck step
  • skipIngest - boolean - skip the ingestion step

Example of nodeGroupExecution/ingestFromCsvStringsByIdAsync:

POST: host:12058/nodeGroupExecution/ingestFromCsvStringsByIdAsync
{  
 "nodegroupId":"ingestTemplateName",  
 "csvContent":"column1, column2, column3\nvalue 1a, 2, 3.5\nvalue2a,3,42.6\n",
 "sparqlConnection":"{  \"name\":\"My_conn\",  \"domain\":\"\",  \"enableOwlImports\":true,  \"model\":[{    \"type\":\"neptune\",    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",    \"graph\":\"http://blast-test/model\"  }],  \"data\":[{    \"type\":\"neptune\",    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",    \"graph\":\"http://blast-test/data\"  }] }"  
}

With a sample return:

{
  "message":"operations succeeded.",
  "status":"success",
  "warnings": ["first warning", "second warning"],
  "simpleresults":{
    "JobId":"job-8c2bd241-6633-4866-b525-2e91cb9a4800"
  }
}

The warnings field may exist if there are warnings such as missing or extra columns. These warnings are for informational purposes only.

The jobId should be used as described in wait for job to complete

Note that the status of "success" does not mean that the ingestion has succeeded, only that service was found and the job kicked off successfully.

Get nodegroup information

POST: http://host:12056/nodeGroupStore/getNodeGroupMetadata

This runs synchronously and returns a table of nodegroup id, comments, creation date, and creator.

{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "ID",
        "comments",
        "creationDate",
        "creator"
      ],
      "rows": [
        [
          "My favorite nodegroupID",
          "query returns information about something",
          "2020-04-19",
          "205000999"
        ],
        ...
      ]
    }
  }
}

Get nodegroup's runtime constraints

POST: host:12056/nodeGroupStore/getNodeGroupRuntimeConstraints
{  
 "nodeGroupId": "My nodegroup ID"
}  

Synchronously returns a table of the variables id, item type (PROPERTYITEM or NODE), and data type

{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "valueId",
        "itemType",
        "valueType"
      ],
      "rows": [
        [
          "?productionStage",
          "PROPERTYITEM",
          "STRING"
        ],
        [
          "?alloyName",
          "PROPERTYITEM",
          "STRING"
        ]
      ],
      "type": "TABLE",
      "col_type": [
        "string",
        "string",
        "string"
      ],
      "col_count": 3,
      "row_count": 2
    }
  },
  "status": "success"
}

Completing asynchronous jobs given a jobId

The general flow for asynchronous jobs is:

  • launch a query, e.g. with /dispatchSelectById
  • wait with /waitForPercentOrMsec
  • if succeeded, get table with /getResultsTable

REST endpoints are provided on the one-stop-shopping nodegroup execution service (usually port 12058), and also on the status and results services (usually ports 12051 and 12052). Use the swagger pages to double-check the latest version's parameters.

Wait for job to complete

Once a job is successfully launched, track its call with /waitForPercentOrMsec which will return when either maxWaitMsec passes or the job reaches percentComplete. You want maxWaitMsec to be short enough to avoid any timeout at the HTTP layer. percentComplete may simply set to 100, or it can be changed incrementally in order to control a status bar in your app.

The /waitForPercentOrMsec endpoint is available on the status service (often port 12051) and the nodegroup execution service (often port 12058).

curl -X POST protocol://host:12058/nodeGroupExecution/waitForPercentOrMsec \
-H "Content-Type: application/json" \
-d '{"jobID": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0", "maxWaitMsec":10000, "percentComplete":5 }'

Sample response while job is incomplete:

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "percentComplete": "50",
    "statusMessage": "still waiting"
  }
}

Sample success response:

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "percentComplete": "100",
    "statusMessage": "Everything was great",
    "status": "Success"
  }
}

Sample failure response:

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "percentComplete": "100",
    "statusMessage": "You asked for a failure",
    "status": "Failure"
  }
}

If simpleresults.percentComplete is less than 100, make repeated calls until it reaches 100.

When interpreting this response, note that status and message refer to the REST call only.

Look for simpleresults.status to indicate "Success" or "Failure" of the job, and simpleresults.statusMessage for an explanation of any failure.

A sample REST failure has status of "failure" and a rationale, as below.

{
  "message": "operations failed.",
  "rationale": "nodeGroupExecutionService/waitForPercentOrMsec threw java.lang.Exception Can't find Job Xreq_2e5089be-ac98-4cf9-8492-f57b77b3c0c0\ncom.ge.research.semtk.edc.JobTracker.getJobPercentComplete(JobTracker.java:179)\ncom.ge.research.semtk.edc.JobTracker.waitForPercentOrMsec(JobTracker.java:1195)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.waitForPercentOrMsec(NodeGroupExecutionRestController.java:303)\n...",
  "status": "failure"
}

Retrieve results table

For queries, when /waitForPercentOrMsec returns a simpleresults.status of "Success", data can be retrieved via nodegroup execution service's /getResultsTable, or the results service /getTableResultsJson. Failures are found in simpleresults.statusMessage.

For ingestion, a table explanation is retrieved when the ingestion fails and simpleresults.statusMessage for success.

curl -X POST protocol://host:12058/nodeGroupExecution/getResultsTable \
-H "Content-Type: application/json" \
-d '{"jobID": "req_2e5089be-ac98-4cf9-8492-f57b77b3c0c0" }'

Sample response:

{
  "message": "operations succeeded.",
  "table": {
    "@table": {
      "col_names": [
        "startdate",
        "finaltotal",
        "oppt_status"
      ],
      "rows": [
        [
          "2019-10-01T00:00:00",
          "0.0",
          "Proposal in progress"
        ],
        [
          "2019-10-01T00:00:00",
          "1.75028e+06",
          "Outstanding"
        ]
      ],
      "type": "TABLE",
      "col_type": [
        "http://www.w3.org/2001/XMLSchema#dateTime",
        "http://www.w3.org/2001/XMLSchema#double",
        "http://www.w3.org/2001/XMLSchema#string"
      ],
      "col_count": 3,
      "row_count": 2
    }
  },
  "status": "success"
}

Note: if your query may return very large results, you may need to switch to the SemTK results service /getTableResultsJsonForWebClient endpoint, which returns a URL.

Note: successful ingestion jobs have no table, but simply a success status message

Note: some versions of SemTK require the capitalization jobID instead of jobId

Common REST parameters

Limit number of results, or retrieve offset results

 "limitOverride": -1,                           // optional query LIMIT
 "offsetOverride": -1,                          // optional query OFFSET

sparqlConnection

For the vast majority of cases, a sparqlConnection should be provided. It is a JSON string, so all quotes have to be escaped and the newlines shown below for clarity may not be allowed. This parameter is typically called an override connection and is loaded as part of your app's configuration so that dev, test, and stage work off different data connections, and the app is easy to update if data is moved.

POST: host:12058/nodeGroupExecution/dispatchSelectById
{  
 "nodeGroupId": "BLAST_GRC_ExpectedFunding",  
 "sparqlConnection":"{
  \"name\":\"My_conn\",
  \"domain\":\"\",
  \"enableOwlImports\":true,
  \"model\":[{
    \"type\":\"neptune\",
    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",
    \"graph\":\"http://blast-test/model\"
  }],
  \"data\":[{
    \"type\":\"neptune\",
    \"url\":\"http://blast-cluster.cluster-ceg7ggop9fho.us-east-1.neptune.amazonaws.com:8182/\",
    \"graph\":\"http://blast-test/data\"
  }]
 }"  
}  

runtimeConstraints

Many endpoints accept runtime constraints of the form

runtimeConstraints: "[{"SparqlID":"?sso","Operator":"MATCHES","Operands":["200001934"]}]"

Each constraint object consists of the SPARQL ID of the item being constrained, an operator, and operands.

Valid operators:

  • MATCHES - operands are list of matches joined by "OR"
  • REGEX
  • GREATERTHAN
  • GREATERTHANOREQUALS
  • LESSTHAN
  • LESSTHANOREQUALS
  • VALUEBETWEEN - accepts two operands
  • VALUEBETWEENUNINCLUSIVE - accepts two operands

Error responses

All REST calls should be checked for both HTTP errors and SemTK errors.

HTTP level may have a **status **number and **error **and message:

Response:
{
  "timestamp": "2019-07-24T20:02:26.035+0000",
  "status": 400,
  "error": "Bad Request",
  "message": "JSON parse error: Unexpected character ('{' (code 123))",
  "path": "/nodeGroupExecution/dispatchSelectById"
}

Failures inside SemTK, on the other hand, always have a status of "failure" and a rationale

Response:
{
  "message": "operations failed.",
  "rationale": "service: nodeGroupExecutionService method: dispatchAnyJobById() threw java.lang.Exception Could not find nodegroup with id: BLAST_GRC_ExpectedFunding NOPE\ncom.ge.research.semtk.api.nodeGroupExecution.NodeGroupExecutor.dispatchJob(NodeGroupExecutor.java:376)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.dispatchAnyJobById(NodeGroupExecutionRestController.java:475)\ncom.ge.research.semtk.services.nodeGroupExecution.NodeGroupExecutionRestController.dispatchSelectJobById(NodeGroupExecutionRestController.java:604)\n...",
  "status": "failure"
}

Note that these should not be confused with SemTK successfully indicating that a job failed. This is not an service layer "error" but successful handling of job failure. The outer status indicates the status of the service call, where the inner simpleresults.status indicates the status of the job. For example:

Response:
{
  "message":"operations succeeded.",
  "status":"success",
  "simpleresults":{
    "status":"Failure"
  }
}

Conversely, a failure retrieving results or status may be caused by HTTP layer or service failures. When this happens, the status of the actual job is unknown until the error is corrected.