Home - ge-semtk/semtk GitHub Wiki

SemTK: Semantics Toolkit

SemTK is an open source project intended to provide easy interactions with semantic triplestores (RDF stores). It is built on the W3C semantic web stack.

It is composed of two main parts:

  • SemTK Java API / REST Services - code and services to facilitate interacting with semantic triplestore data (e.g. querying data, ingesting data)
  • SPARQLgraph - a Javascript-based graphical web application providing drag-and-drop access to many SemTK features

SemTK was developed by the Knowledge Discovery Lab at GE Research. Contact: Paul Cuddihy

SemTK is licensed under Apache 2. Please include our logo whenever possible.

Demos

See these Docker Demo instructions for a Docker demo courtesy of the RACK project. There are a variety of additional installation and demo options listed in the RACK wiki

Once you have the project running your favorite way, check out the "Hello World" wiki page

Key Features

Below are some of the features that SemTK provides:

  • SPARQL query generation & execution (supports Virtuoso, Fuseki, Neptune, Jena triplestores, extensible to other SPARQL 1.1 stores)
  • Ingestion of tabular data with drag-and-drop templates and/or auto-generated class templates
  • Path-finding during drag-and-drop query building, based on model and/or instance data
  • Storing queries by id
  • Utility functions (e.g. loading OWL/TTL files, clearing data)
  • Instance data browsing

The tool is designed for triplestores with an ontology-based model. We use SADL for ontology authoring.

Latest Additions

2023

  • quantified cardinality restriction explorer: calculate restrictions and visually explore the instance data with auto-generated CONSTRUCT queries
  • data-overload productions when exploring CONSTRUCT results
  • 'manifest' ingestion packages
  • python command-line

2022

  • SADL datatypes and their restrictions are now handled, with restrictions enforced during ingestion
  • CSV handling has been upgraded to full CSV standards. Especially: escape quotes with "", embedded newlines, preserve escape sequences like \n
  • Select query column order can be saved by dragging the columns and hitting the Save Column Order button
  • Report generation using the "Report Tab" on SPARQLgraph allows a set of nodegroups to be run with success and failure criteria, saved as a report specification and re-run as needed. Report results can be download as HTML.
  • Report generation includes cardinality checker

2021

  • Graph exploration by running a CONSTRUCT query, then clicking any node to show all connected data
  • Support of aggregate functions (MAX, MIN, etc.) and GROUP BY
  • Overhauled Explore Tab with graphical display of predicate counts between classes.
  • Instance-based path-finding. See the Path-finding wiki page.
  • Added support for EXISTS on data properties.
  • Plotting integration via Plotly. See the Plotting wiki page.
  • SemTK now supports Blazegraph. See the Triplestores wiki page.
  • Import mapping wizard helps quickly build ingestion templates. See the Ingestion mapping wizard wiki page.

2020

  • support for UNION queries wiki page
  • visJs display of CONSTRUCT query results in SPARQLgraph
  • moving EDC (external data connections) to opensource
  • moving FDC (federated data connections) and FDCCache to opensource
  • improved ingestion speed using Jena in-memory cache

A Quick Look

This section is intended to give a quick idea of the most basic functionality of SemTK and SPARQLgraph.

Load a Connection

To establish a connection to one or more data graphs, choose connection->load off the main menu.

A user will first specify what triplestore to connect to using the dialog below. The Server URL contains the location of the triplestore. Each connection may have one or more datasets (a named graph within the given triplestore) containing the ontology, and one or more datasets containing instance data. The OWL Imports checkbox indicates that SemTK should recursively load datasets that are referenced as imports in the ontology.

The clear cache checkbox forces the ontology to be re-read from the triplestore, bypassing the cache mechanism in the service layer.

After a connection is loaded, the main SPARQLgraph screen might look like this.

The Ontology Info Pane (top left) shows a subset of the ontology, including classes, subclass relationships, properties, property domain/range, and enumerations. Mousing over an item will display a tooltip with the items full URI, and any aliases or notes.

The user can drag classes into the Nodegroup Pane (top right), and then select properties to return, delete, constrain, etc. The selected classes/properties/options are referred to in SemTK as a nodegroup. Using the nodegroup, as well as the corresponding connection and ontology info, the tool will generate a query that matches the semantic model and run against all ontology/instance datasets in the connection.

The generated SPARQL query (e.g. INSERT, COUNT, DELETE) will be shown in the Query Pane (middle). This SPARQL will include subclass inference (subClassOf *) for any class that has subclasses in the ontology, and subproperty inference (subPropertyOf *) for any properties that have subproperties in the ontology.

After the query is executed, results are shown in the Results Pane (bottom).