Graph Validation - ge-semtk/semtk GitHub Wiki

SemTK provides several ways to validate the contents of a knowledge graph:

Validation during ingestion

The SemTK ingestion process performs basic data validation before loading data to a graph. It checks that the data to be loaded conforms to the model (excluding cardinality requirements). Additional ingestion-time data validation checks can be configured as well.

Validation against OWL cardinality restrictions

SemTK provides capabilities for checking if data conforms to the OWL cardinality restrictions found in its ontology.

A sample cardinality restriction is found in the final line of SADL below:

FruitBasket is a type of Thing,
	described by includes with values of type Fruit.
includes of FruitBasket has at most 3 values.

The SemTK Ontology Info Service (endpoint ontologyinfo/getCardinalityViolations) and its clients provide access to cardinality restriction violations.

To browse cardinality restrictions in SPARQLgraph, use the Explore tab in "Restrictions" mode. The example below shows a fruit basket that includes 4 fruits, exceeding the maximum of 3.

wiki1

In the UI above, "violations" are cases where the actual number exceeds a maximum cardinality restriction (e.g. 4 fruits exceeding the limit of 3 fruits per basket). In contrast, "incomplete data" refers to cases where the actual number is less than a minimum cardinality requirement (e.g. the model specifies that an Address class has a recipient property, but data contains an Address instance with no recipient)

Validation using SHACL Shapes Constraint Language

SHACL Shapes Constraint Language is a W3C-recommended language for validating RDF graphs against a set of conditions ("shapes").

The following is a sample SHACL shape (further examples can be seen at DeliveryBasketExample-shacl.ttl and RACK-shacl.ttl)

### A FruitBasket must include between 1 and 3 fruits
### A FruitBasket expiration date must be later than pack date
dbex:FruitBasketConforms
	a sh:NodeShape;
	sh:targetClass dbex:FruitBasket;
	sh:property [
		sh:path 	dbex:includes;
		sh:minCount 	1;
		sh:maxCount 	3;
	];
	sh:property [									
		sh:path 	dbex:packDate;
		sh:lessThan 	dbex:expirationDate;
	];
	.

The SemTK Utility Service utility/getShaclResults endpoint validates a SPARQL connection against a set of SHACL shapes.

To browse SHACL results in SPARQLgraph, use the Explore tab in "SHACL Validation" mode. The example below shows a fruit basket that has an expiration date preceding its pack date, violating the SHACL shape above.

wiki2

Tips for writing SHACL shapes:

  • May define sh:message for a shape. If sh:message is not present, the SHACL processor will generate a message. For some constraint types (e.g. sh:minCount, sh:maxCount, sh:minLength, sh:maxLength), the generated message may be more informative than a custom message. For example, the generated message for sh:maxCount includes the actual instance count found. Likewise, the generated message for sh:maxLength includes the offending string.
  • The SHACL specification includes sh:description (for Property Shapes only), but SemTK does not include these in its SHACL output as they seem to not be accessible via the Jena SHACL Java API. Please use sh:message.
  • When specifying constraints that take a shape as input (e.g. sh:node), may define the shape either inline (it will become a blank node) or as a named shape defined elsewhere. The latter option provides a chance to give it a descriptive name, which may result in a more understandable violation message.