Future Directions - ge-semtk/semtk GitHub Wiki

This page is a collection of ideas for next improvements and new directions for SPARQLgraph.

In memory graph

In addition to Fuseki, Virtuoso, and Nuptune, support a server-side in-memory graph.

As of 2021 in-memory graph is used internally to improve ingestion performance.

Generate queries using Jena library

Instead of the current custom Java code, use the Jena library to generate SPARQL query from a nodegroup. This could be divided up by query type, e.g. starting by generating a SELECT query from a nodegroup.

SemTK "ontology"/model auto-generation and checking

SemTK loads a model by querying classes, sub-class relationships, property domains and ranges, and notes and aliases. It would be useful to try to generate such a model from instance data. In a slight variation of this tasks, it could be useful to check data against a model to report extra or missing or contradicting relationships between the data and the model.

Show Class Restrictions

The left pane showing a model does not reflect restrictions such as cardinality, and they are not taken into account when queries are generated. This information could be loaded via another SPARQL query during the load process. It could then be stored in the OntologyInfo, displayed, and referenced during query-generation.

It would be quite useful after a large data load to run cardinality checks to make sure data matches the model.

Editing the Model

The graphical tree representation of the model in the left pane could be expanded from read-only to read-write, so that the model could be built and edited inside SPARQLgraph. Challenges include writing a SADL file--if that was the source of the model, and writing the new model back to owl and to the triple store.

Generic data viewer

Given a nodegroup with runtime constraints, a web page could automatically display drop-down menus and filters. After a query is executed, user could be given the choice to display results in visJs (or a similar open source packcage) as in any number of chart formats.

In an advanced version, such "dashboards" could be saved and accessed by URL. This is essentially a simple dashboard app-builder.

As of 2021 the support for visual display of CONSTRUCT query results gets us part of the way to this goal. 
Further, there is some integration with Plot.ly which makes it easy to build DASH applications.

More efficient data ingestion

Current ingestion is powerful but quite slow due to performance of SPARQL INSERT on most platforms. Explore a redesign where ingestion instead produces owl/rdf or turtle and uploads those files.

As of 2021, ingestion with 2 URI lookups against a local Fuseki db has been improved 
to about a factor of 2 over a custom python script that looks up all URI's first and writes then ingests turtle.    
With different triplestores, performance will still vary and 
SPARQL INSERT optimizations on a triplestore-by-triplestore basis can now be made.