Storing and Executing Queries - ge-semtk/semtk GitHub Wiki

The Nodegroup Store Service and Nodegroup Execution Service allow users to store a nodegroup (usually a query) and execute it repeatedly in the future. (A nodegroup is typically a semantic query, but could also contain data loading specifications and/or a SPARQL connection.)

Storing a nodegroup

To store a nodegroup using the UI

Create a query in the SparqlGraph or SparqlForm UI. If needed, designate attributes as runtime-constrainable (meaning that they can be filled in at query runtime, and vary with each execution) using the "runtime constrained" checkbox in the Filter dialog. To store the query, use menu option Nodegroup > Save To Store (or Query > Save To Store), which will ask for an ID. Example: create a nodegroup with ID "GetMachineTestInfo" to retrieve ids, dates, and personnel for machine tests. Designate the date and personnel attributes as runtime-constrainable.

To confirm that the nodegroup stored successfully, load it using the menu option Nodegroup > Load From Store (or Query > Load From Store).

To store a nodegroup using service endpoints (this is for reference only, should not be needed in most cases)

Use the /nodeGroupStore/storeNodeGroup endpoint, which requires input like in the sample below.

{
  "name": "GetMachineTestInfo",
  "creator": "Jane Smith",
  "comments": "Retrieve ids, dates, and personnel for machine tests",
  "jsonRenderedNodeGroup": "...paste long string here - see below..."
}

Get the jsonRenderedNodegroup string by creating a query using SparqlGraph and downloading it to a file. (When testing with Swagger, need to escape the double-quotes (substitute \" for "), eliminate line returns and tabs (substitute spaces for \n, \r, \t), and replace .. with .)

Confirm that it worked by using the /nodeGroupStore/getNodeGroupById endpoint, with the following input, which will display the stored nodegroup.

{
  "id": "GetMachineTestInfo"
}

Executing a stored nodegroup

The user executes a stored nodegroup by providing the nodegroup ID and the runtime constraints for this execution. Example: Execute "GetMachineTestInfo" for tests run by Bob Smith.

To execute a stored nodegroup using the UI

In the SparqlGraph UI, load the nodegroup using the menu option Nodegroup > Load From Store, and then execute it.

In the SparqlForm UI, load the nodegroup using the menu option Query > Load From Store, and then execute it.

To execute a stored nodegroup using service endpoints

1. Submit the query

Use the /nodeGroupExecution/dispatchById endpoint to execute a nodegroup with a given id. You may either specify a SPARQL connection, or use the one stored with the nodegroup. You may optionally add runtime constraints (if supported by the nodegroup). Sample inputs are below.

The SPARQL connection specified here overrides any SPARQL connection stored with the nodegroup.

{
  "nodeGroupId": "GetMachineTestInfo",
  "sparqlConnection": "{\"name\": \"MachineTests\", \"type\": \"virtuoso\", \"dsURL\": \"http://localhost:2420\", \"dsKsURL\": \"\", \"dsDataset\": \"http://research.ge.com/machineTestDataset\", \"domain\": \"http://research.ge.com\"}"
}

Nodegroups may be stored with sparqlConnection information included. It is usually safest to provide a sparqlConnection as shown above, but the following hardcoded sparqlConnection parameter will execute the query using sparqlConnection stored with the "GetMachineTestInfo" nodegroup:

{
  "nodeGroupId": "GetMachineTestInfo",
  "sparqlConnection": "{\"name\": \"%NODEGROUP%\",\"domain\": \"%NODEGROUP%\",\"model\": [],\"data\": []}"
}

To add runtime constraints to the query, use this format:

{
  "nodeGroupId": "GetMachineTestInfo",
  "sparqlConnection": "{\"name\": \"MachineTests\", \"type\": \"virtuoso\", \"dsURL\": \"http://localhost:2420\", \"dsKsURL\": \"\", \"dsDataset\": \"http://research.ge.com/machineTestDataset\", \"domain\": \"http://research.ge.com\"}",
  "runtimeConstraints": "[ { \"SparqlID\" : \"?personnel\", \"Operator\" : \"MATCHES\", \"Operands\" : [\"Bob Smith\"] } ]",
}

A corresponding CURL call would look like this:

curl -X POST http://host:port/nodeGroupExecution/dispatchById --header "Content-Type: application/json" --header "Accept: */*" --data '{ "nodeGroupId": "GetMachineTestInfo", "runtimeConstraints": "[ { \"SparqlID\" : \"?personnel\", \"Operator\" : \"MATCHES\", \"Operands\" : [\"Bob Smith\"] } ]", "sparqlConnection":"{ \"name\": \"MachineTests\", \"type\": \"virtuoso\", \"dsURL\": \"http://localhost:2420\", \"dsKsURL\": \"\", \"dsDataset\": \"http://research.ge.com/machineTestDataset\", \"domain\": \"http://research.ge.com\" }" }'

The output is a job id, like this:

{
  "message": "operations succeeded.",
  "status": "success",
  "simpleresults": {
    "JobId": "req_61fc3f28-a8d2-4183-9d66-21fd7423492a"
  }
}

2. Check the status until execution is complete

While running, the job status can be checked using the Status Service, for example with the CURL call below. Note that you must paste in the job id from above.

curl -X POST http://host:port/status/getStatus --header "Content-Type: application/json" --header "Accept: */*" --data '{ "jobId": "XXXX" }'

The output is a status message, like these:

{
    "message": "operations succeeded.",
    "status": "success",
    "simpleresults": {
        "status": "InProgress"
    }
}
{
    "message": "operations succeeded.",
    "status": "success",
    "simpleresults": {
        "status": "Success"
    }
}

The status is also available as a passthrough in the Nodegroup Execution Service (nodeGroupExecution/getJobCompletionPercentage).

3. Retrieve the results

When the status indicates that the job is complete, the results can be found using a call like below. Note that you must paste in the job id from above.

curl -X POST http://host:port/results/getTableResultsJson --header "Content-Type: application/json" --header "Accept: */*" --data '{ "jobId": "XXXX" }'

The results are also available as a passthrough in the Nodegroup Execution Service (/nodeGroupExecution/getResultsLocation endpoint, which gives a CSV file location containing query results).

Runtime constraint details

Supported runtime-constrainable types are as follows (from com.ge.research.semtk.load.utility.ImportSpecHandler):

//   from the XSD data types:
//   string | boolean | decimal | int | integer | negativeInteger | nonNegativeInteger | 
//   positiveInteger | nonPositiveInteger | long | float | double | duration | 
//   dateTime | time | date | unsignedByte | unsignedInt | anySimpleType |
//   gYearMonth | gYear | gMonthDay;
             
//   added for the runtimeConstraint:
//   NODE_URI

Supported runtime constraint operations are as follows (from com.ge.research.semtk.belmont.runtimeConstraints.SupportedOperations):

MATCHES,                // value matches one of the operands (accepts collections)
REGEX,                  // value matches the string indicated by the given operand
GREATERTHAN,            // value is greater than the operand
GREATERTHANOREQUALS,    // value is greater than or equal to the operand
LESSTHAN,               // value is less than the operand
LESSTHANOREQUALS,       // value is less than or equal to the operand
VALUEBETWEEN,           // value is between the given operands, including both endpoints
VALUEBETWEENUNINCLUSIVE // value is between the given operands, not including endpoints