Queries Advanced Topics - ge-semtk/semtk GitHub Wiki


On this page


OPTIONAL for Partial Results

One of the biggest ways that a SPARQL semantic web query differs from a relational query is the way that it handles the relational null using an OPTIONAL clause. Semantic web queries are graph patterns, and only data which completely matches the pattern is returned. To return partial results, the OPTIONAL clause must be used. Consider this example where we want "REQUIREMENTs and their TESTs, TEST_RESULTs, and TEST_STATUS":

Note that without the OPTIONAL clauses, only REQUIREMENTs which have a TEST, TEST_RESULT, and TEST_STATUS would be returned. But with the OPTIONAL clauses, all REQUIREMENTS which have an identifier will be returned. Their TEST, TEST_RESULT, and TEST_STATUS will be shown if they exist.

Further note the use of "reverse optional", displayed as } optional verifies in SPARQLgraph, and highlighted here in yellow. The reverse is used because we want the Requirement to be returned every time and the rest of the graph is optional. Since the verifies relationship is pointing towards Requirement, the optional needs to be reversed so that it takes effect in the direction opposite that of the relationship. This technique is repeated for the confirms relationship but not for the resultrelationship, which already points in the desired direction.

MINUS for Negative Results

Now consider the example where we want the partial results described above, but only for REQUIREMENTs which do not have a test status of passed. This can be accomplished with a MINUS clause which describes the graph patterns which we DO NOT want to match.

In the example below, a single MINUS clause is used to describe REQUIREMENTs which have TEST, TEST_RESULT, and TEST_STATUS. Hidden inside the TEST_STATUS dialog is the requirement that it equals "Passed", shown here:

As described above in the OPTIONAL section, reverse MINUS is used since the verifies property points in the opposite direction that we would like the MINUS clause to work.

The results will be the partial results described above in the Optionals section, but limited to those REQUIREMENTS which do not have a TEST, TEST_RESULT, and TEST_STATUS of "Passed". Only non-passing examples are returned.

Beware the Union Minus

It may feel intuitive to write the query as a tree of unions ( see Unions ) with MINUS clauses. However, the WC3 Variable Scoping rules and WC3 Scope of filters describe how variables and filters inside the UNION {} are not scoped in a manner that would make this seemingly intuitive query return any results.

Summary: this query will not work. UNION { MINUS {} } will not match anything.

SubClasses and SubProperties

When a class is added to the query canvas, any queries generated will search for that class or any of its sub-classes.

Likewise, when a property is used (an edge Object Property or a Data Property), any queries generated will search for that property or any sub-properties. For a sub-property to be considered, it must also have a Domain of a super-class* or sub-class* of the subject class.

To query specific sets of sub-properties or sub-classes, use the union query.

Unions

SPARQLgraph can be used to generate union queries. In general, a union query is an "OR" where expressions share a set of sparqlId's. Creation of union queries in SPARQLgraph is based on the following:

  • create branch points for the union in a property or class, these will be marked with a colored Union symbol U
  • subgraphs beyond the branch point will be shown in the matching color
  • inside unions, sparqlId restrictions are loosened so that items from each branch can refer to the same return value
  • unions may be nested

The creation of union queries will be demonstrated with a brief tutorial using an ontology with a simple battery containing multiple colored cells. Consider these five batteries, each with up to four cells:

description batt_ID cell1_date cell1_ID cell1_color cell2_ID cell2_color cell3_id cell3_color cell4_id cell4_color
normal battery battAA 2017-03-23T10:23:00 A red B blue C white D white
normal battery battAB 2017-03-23T10:24:00 E red F blue G white H white
no date battAC I red J blue K white L white
no colors on cells battX 2017-03-23T10:26:00 M N O P
two cells battY 2017-03-23T10:27:00 Q blue R blue

Union on object properties

Consider the following query:

Find all cells with color "red" OR with no color at all.
Return the cellId, along with the battery id and name.

Such a query looks like this in SPARQLgraph:

and is built with the following steps:

  1. create a Battery, and set ?id and ?name to be returned
  2. add a Cell, and set ?cellId to be returned
  3. add a color, and constrain it to "red". Using the "suggest values" button is helpful here. Also, remember to uncheck the "return" box so the color is not returned.
  4. create a union by selecting the "cell" arc and choosing "new union" off the "opt/minus/union" menu. At this point your subgraph will be rendered with a unique color, and the arc will be marked with a U
  5. add another Cell to the Battery.
  6. add to the union by selecting the new "cell" arc and choosing the "cell" union off the "opt/minus/union" menu. Now that this subgraph is added to the union, go back to the new Cell and return ?cellId making sure to use the same "cellId" sparqlId as the Cell in step 2.
  7. add a color to the new cell, and select the new color arc and choose "minus" off the "opt/minus/union" menu.

You now have a union with two subgraphs. The top subgraph matches all cells with color red. The bottom subgraph matches all cells with no color. The "?cellId" sparqlId is shared between the branches. To make it easy to inspect results, order by "cellId".

The following SPARQL is generated:

prefix ...
select distinct ?id ?name ?cellId
		FROM <http://your/graph>
 where {
	?Battery a ?Battery_type .
	?Battery_type  rdfs:subClassOf* batterydemo:Battery.
	?Battery batterydemo:id ?id .
	?Battery batterydemo:name ?name .
	{
		?Battery batterydemo:cell ?Cell_1 .
			BIND(?Cell_1 as ?Cell) .
			?Cell_1 batterydemo:cellId ?cellId1 .
			BIND(?cellId1 as ?cellId) .
			?Cell_1 batterydemo:color ?Color_1 .
				FILTER ( ?Color_1 IN (<http://kdl.ge.com/batterydemo#red> ) ) . 
	}
	 UNION 
	{
		?Battery batterydemo:cell ?Cell .
			?Cell batterydemo:cellId ?cellId .
			minus {
				?Cell batterydemo:color ?Color .
			}
	}
}
ORDER BY ?cellId

Note that under the hood, each item in the graph has a unique identifier. BIND statements are used to match ?cellId between the two subgraphs in the UNION.

This query returns all the red, and colorless cells:

id name cellId
battAA normal battery A
battAB normal battery E
battAC no date I
battX no colors on cells M
battX no colors on cells N
battX no colors on cells O
battX no colors on cells P

Union on two data properties

For the sake of illustration, consider this query:

Find all cells with the letter 'y' in the id or in the name.
Return the cells' ids and names.

Such a query looks like this in SPARQLgraph:

and is built with the following steps:

  • Add the Battery node to the nodegroup
  • Select id:
    • choose 'new union' from the menu
    • apply the filter FILTER regex(?id, "[Yy]")
  • Select name:
    • choose 'id' union from the menu
    • apply the filter FILTER regex(?name, "[Yy]")

You now have a union query that will return names and ids of all batteries that have the letter 'y' in the name or id.

The query will look like this:

prefix ...
select distinct ?id ?name
		FROM <http://your/graph>
 where {
	?Battery a ?Battery_type .
	?Battery_type  rdfs:subClassOf* batterydemo:Battery.
	{
		?Battery batterydemo:id ?id .
			FILTER regex(?id, "[Yy]")   .
	}
	 UNION 
	{
		?Battery batterydemo:name ?name .
			FILTER regex(?name, "[Yy]") .
	}
}

and, given the data shown in the table above, will return the results:

id name
normal battery
battY

Union on two separate subgraphs

Now consider this query:

Find the id that belongs to any battery OR any blue cell

This query is the union of two disconnected subgraphs. It will look like this:

and is built with the following steps:

  • Add the Battery node to the nodegroup
    • return the ?id
    • open the class URI and select "new union", and de-selecting "return"
  • drag a Cell node, such that it is disconnected
    • return the cellId as ?id
    • open the class URI and select the "?Battery" union, and de-select "return"
  • Add a Color to the Cell, and constrain it to "blue", de-selecting "return"

This results in a query that is the union of the two subgraphs, each of which returns something for ?id.

The query looks like this:

prefix ...
select distinct ?id
		FROM <http://your/graph>
 where {
	{
		?Cell a batterydemo:Cell .
		?Cell batterydemo:cellId ?id_0 .
		BIND(?id_0 as ?id) .
		?Cell batterydemo:color ?Color .
			FILTER ( ?Color IN (<http://kdl.ge.com/batterydemo#blue> ) ) . 
	}
	 UNION 
	{
		?Battery a ?Battery_type .
		?Battery_type  rdfs:subClassOf* batterydemo:Battery.
		?Battery batterydemo:id ?id .
	}
}

and it returns the id of every battery and every blue cell:

id
F
B
J
R
Q
battAB
battAA
battX
battAC
battY

Combining UNION with MINUS

Consider the query "Cat named fluffy OR Cat does not have a kitty".

It is may be tempting to create a Cat and do a UNION on FILTER (?name, "fluffy") and MINUS hasKitty. That is, a union of a data property and MINUS an object property.

SemTK would create SPARQL like this:

?Cat a namespace:Cat
{
   ?Cat namespace:name ?name.
   FILTER regex (?name, "fluffy").
} UNION {
   MINUS { ?Cat namespace:hasKitty ?Kitty  }
}

And given the W3C recommenadation, since the MINUS clause has no left-hand side, it will always succeed. This query will return all cats.

Instead, build the Union on two separate subgraphs. Once both Cat nodes are added to the union, they can both be named ?Cat and their name can both be named ?name. The ?Cat which is a single node holds the ?name with the FILTER regex (?name, "fluffy").

This will generate SPARQL like this:

{
    ?Cat a AnimalSubProps:Cat .
    ?Cat AnimalSubProps:name ?name .
    minus {
        ?Cat AnimalSubProps:hasKitties ?Kitty .
    }
} UNION {
    ?Cat a AnimalSubProps:Cat .
    ?Cat_1 AnimalSubProps:name ?name .
    FILTER regex(?name, "fluffy") .
}

And this will return all cats named "fluffy" plus all cats which do not have kitties.

Construct Queries

CONSTRUCT queries return results in graph form instead of table, thus taking full advantage of the semantic web stack. This type of query is accessed by setting the query dropdown (highlighted below in yellow) to construct.

Rules for building CONSTRUCT queries:

  • any node and edge shown on the canvas are constructed
  • any data properties selected for return are constructed
  • any constraints are applied in the query WHERE clause

graphical results

Hovering the mouse over a node will show:

  • the URI of any class node
  • the type of any data

Results are interactive:

  • Double-click or selecting a node and hitting the Expand button will add all one-hop connections to the display
  • Selecting a node and hitting the Remove button will remove a node from the display only (this is NOT a delete query!)

Labeling nodes in CONSTRUCT results

Choosing the menu options->Construct label property... allows you to specify a string property that will be used to label the nodes in a CONSTRUCT query results graph.

For example, choosing name:

image

Would cause results of a cat query like the one above to display with the name as the node label:

image

There are some important considerations and limitations to consider:

  • this functionality will not change your query. If the query does not return the name property of a node, that node will have the default label (typically the class)
  • when used as a label, name will not appear as a data property
  • when used as a label, name will appear along with Type and the URI in the mouseover
  • this functionality will change the expand function so that when you double-click or hit the "Expand" button, names will automatically be retrieved and used as the node labels
  • the functionality is limited to string data properties
  • only one property may be used at a time
  • the property is stored in your browser cookies

JSON-LD results

A download link "results.json," which will download a file in JSON-LD format.

Note that different triplestores have been observed to interpret the JSON-LD format differently:

  • a link to another object may be of the form { "@id": "ID123" } or just the string "ID123"
  • data properties may be typed { @value: "35", @type: integer } or may be strings "35"
  • types and URIs may be prefixed in full "uri://my/prefix#uri123" or abbreviated based on query prefixes "prefix:uri123"

The SPARQLgraph interface attempts to resolve these differences and show a standard network format.

Recursive Subtree Query

A recursive construct query can be used to query a particular sub-tree of instance data. This query demonstrates how to construct the tree of the cat "grannymom" (shown above) and all of her descendants.

Start with a construct query that has three Cat nodes connected by the hasKitties predicate:

  • the target: grannymom
  • a generic parent ?Cat_Parent
  • a generic child ?Cat_child

Building this query will involve some subtleties, but the final version will have a meaning that might be described in English as "For all Cats who are kitty descendents of that Cat 'grannymom', Construct the cat and optionally any kittens."

Why do I see the error: Qualifier unsupported in CONSTRUCT clause: hasKitties*

If you build this nodegroup in a certain order (before completing the step below), you will experience the error message above. This reflects a subtlety in CONSTRUCT queries. SemTK and SPARQL do not want to create a link in your results that looks like "hasKitties" but actually reflects "hasKitties->hasKitties->hasKitties" (or any instance of hasKitties*) in the triplestore data. To avoid this misleading result data, SemTK will not construct a link containing a qualifier. By turning off CONSTRUCT in the parent node--as described next--the top node and link to it become constraints, and are not constructed.

Next set up the target Cat node such that it is not constructed, and that the name matches "grannymom". This is accomplished by selecting the ?Cat field and unclicking the "construct" field in the dialog:

Then click on ?name, select it for return, and set the filter to "grannymom":

The target node now matches the correct parent, and it will not be constructed. Now complete the following steps:

  • set the Cat's outgoing hasKitties to have the qualifier *. This ensures that any ?Cat_Parent that has 0 or more hasKitties relationships back to "grannymom" will be constructed
  • select ?Cat_Parent's name field to be returned/constructed by choosing the "select" checkbox
  • set ?Cat_Parent's outgoing hasKitties to optional. This ensures that a descendent of "grannymom" will be constructed even if it has no hasKitties
  • set ?Cat_child name field to be returned/constructed

The resulting query will construct a tree of "grannymom," all her descendants, and their names.

Delete Queries

DELETE queries work differently from CONSTRUCT queries, in that

  • any node and edge shown on the canvas are added to the WHERE clause
  • any data properties selected for return are added to the WHERE clause
  • items to be deleted must be explicitly specified

Specifying items to delete

Data properties and object property edge dialogs have select for delete check boxes

Node dialogs (accessed by clicking on the class name) contain a menu with a choice of delete modes:

  • NO_DELETE
  • TYPE_INFO_ONLY - only delete type triples with this node's matching URIs as the subject
  • FULL_DELETE - delete all triples with this node's matching URIs in the subject or object
  • LIMITED_TO_MODEL - like FULL_DELETE, but limited to relationships specified in the model
  • LIMITED_TO_NODEGROUP - like FULL_DELETE but limited only to relationships in the nodegroup
FULL_DELETE on nodes is by far the most common type of delete query

Optimizations Internal

SemTK attempts to optimize queries based on performance testing of different triplestores.

VALUES clauses vs FILTER IN

This is used in ingestion URILookups, which can be several queries per row of ingestion data. Hence this can have a very large performance impact.

  • FILTER IN is preferred for AWS Neptune
  • other triples stores are more performant with VALUES clause

rdfs:subclassOf*

This is a very common query clause since a node in a nodegroup typically matches all subclasses.

  • Blazegraph peforms best with rdfs:subclassOf*
  • other triple stores are more performant with a list of classes in a VALUES clause
⚠️ **GitHub.com Fallback** ⚠️