Path Finding - ge-semtk/semtk GitHub Wiki

The SPARQLgraph canvas provides a powerful capability to drag-and-drop new classes into a nodegroup, and perform path-finding to determine possible ways the new class can be connected. These connections can contain one-or-more intermediate new classes, creating a multi-hop path.

Path-finding modes

Path-finding has several modes available:

  1. model - uses only the ontology to determine paths. This method can suggest many paths very quickly in a simple ontology. However, ontologies where many classes have common superclasses and/or many sub-property relationships, the number of possible connections may be very large and confusing.

  2. predicate data - suggests only predicates for which instance data exists in the connection matching the predicate and the class of its subject and object. This method can be slower, as a cache of statistics about predicates occasionally needs to be built. However, it can generate much better targeted suggestions.

  3. nodegroup data - (EXPERIMENTAL) finds only paths such that the new nodegroup will match some instance data from the connection which is loaded. This is powerful, but also the slowest method of path-finding. In order to avoid locking your session or the triplestore with long-running queries, this approach can skip paths when its queries don't return quickly. When combined with the overall path-finding timeout, this method is likely to return a small subset of possible paths, and hence is considered experimental.

Nodegroup data mode is experimental, and disabled by default. It can generate queries which take a very long time to complete, so they are mitigated with a query timeout. Unfortunately Fuseki has been observed to freeze after a number of long-running queries have timed out, and need to be restarted.

To enable nodegroup data mode pull the Help->Test menu pick on the main tab of SPARQLgraph.

Timeouts and incomplete results

All path-finding modes have time outs and may return before finding any paths at all, or return a different number of paths for the same operation on different occasions (due to changes in system and triplestore performance). They should be expected to provide quality suggestions, but not necessarily always find all possible paths.

Predicate and nodegroup data modes use a cache of information about the ontology that may be refreshed occasionally, and this may take a noticeable amount of extra time. Normally this will occur at most once per session, and other path-finding attempts will run without this delay.

Bypass automatic path-finding

Path-finding can be bypassed by holding down Shift while dragging a class onto the canvas.

Subsequently clicking on an object property in the nodegroup will allow connections to be built manually.

Common questions

Drag on a class and it is disconnected

This happens when no path is found. This happens most commonly in predicate data mode when there is no instance data to connect the new class to the existing nodegroup. It may be helpful to switch to model mode. This can also happen if path-finding is too complex and the system does not find the connection in a specified timeout interval.

Choose "predicate data" or "nodegroup data" mode results in a wait

This wait is the system caching statistics about the predicates in the instance data. This means either the cache has aged out, or the data or model has changed since the cache was last built. This will not happen every time.

Data seems out-of-date

If changes are made to the ontology or data using direct SPARQL calls without the SemTK service layer, the data used by path-finding could be obsolete. In this case, re-load the connection using the clear cache checkbox at the bottom of the dialog.