Class or Workshop Setup - ge-semtk/semtk GitHub Wiki

Introduction

This section presumes some basic familiarity with SemTk as might be gained through the Demo and Hello World pages. It lays out basic considerations and instructions for group exercises using SPARQLgraph to learn semantic web technologies. It also gives some topic areas for exploration.

Install local copy of Opensource Virtuoso

Background: basic architecture of SemTk:

your browser loads javascript from semtk.research.ge.com
browser executes javascript and makes REST calls to semtk.research.ge.com
semtk.research.ge.com services make HTTP calls to a triplestore

semtk.research.ge.com provides:

up-to-date SPARQLgraph app and java rest services running 24x7 on a single AWS node
an opensource virtuoso instance that is cleaned out and reset every night
data and nodegroups (queries and ingest templates) can be saved to triplestore via the nodegroup store

Given this architecture, the AWS node is likely to support a group of serveral dozen concurrent users without issue. However, the triplestore is only meant to be temporary. You will need to use your own instance of a triplestore in order to save data and nodegroups inside the system.

Installing your own triplestore: Opensource Virtuoso

Requirements:

a machine with a static ip or fully qualified domain name that is accessible from the outside world.

If you do not have this, you'll need to install a local copy of the entire SemTK system as described in "Installing"

Installing:

Virtuoso instructions for LINUX or windows are found on the virtuoso site

Testing:

open SPARQLgraph from semtk.research.ge.com
pull the connection->load menu, and the following appears:

Follow these steps to test your installation of virtuoso with SemTK:

use the New button to create a new connection
Name: give your connection a name
Domain: "http://" This will cause SPARQLgraph to treat URI's with this prefix inside the model graph to be treated as T box, essentially every URI.
Server URL: http://YOUR_NEW_INSTALLATION_MACHINE:2420 This tells SPARQLgraph where to find the virtuoso server.
Type: "Virtuoso"
Dataset: http://your_graph/model By clicking on each "1" you should set different graph for model and data.

After hitting Submit, you should get a message

Top-level class query returned no rows. Dataset is empty.

This means your connection succeeded, and have no model loaded yet.

Build and load models and data

The Hello World SPARQLgraph wiki page contains instructions on how to build and load models using SADL or Protege. It also explains how to load data and import and query.

Working separately & sharing results

Read-only graphs

It requires significant Virtuoso administrator skills and overhead to set up Virtuoso such that each participant has their own graphs to which only they can write. The most likely default is that all users who have access to your Virtuoso instance will have access to all the graphs. There are several strategies that could be used to manage this situation:

users can install their own local copies of virtuoso
users can use an honor system not to erase each other's data
each user can be given a GUID and be instructed to create the graphs starting with http://GUID. Given that the default Virtuoso conductor passwords are changed, it would be very difficult to find or modify other's data using this scheme.

Submitting nodegroups for review

Nodegroups can be saved into the Nodegroup Store using the menu pick Nodegroup->save to store. The saved nodegroup will contain all connection and import template information. However, this approach will make the nodegroup readable and writable by all users, which could potentially lead to problems as discussed above.

Nodegroups can also be saved to JSON via Nodegroup->download. These JSON files can be stored and emailed like any other file. When a user drags and drops them onto the main SPARQLgraph canvas, the nodegroup is loaded into SPARQLgraph. The user is prompted whether to also switch to the nodegroup's connection, or to remain connected to currently open graphs.

Topics and Exercises

To begin using the semantic web, users will need to understand the basics of modelling as manifested in SADL or Protege. SemTk reads only model features needed for SPARQL-generation, which includes class/subclass relationships, data and object properties, and their domains and ranges.
With and owl/RDF model and data such as that shown in Hello World, users could explore and understand SPARQL using the SPARQLgraph interface. SPARQLgraph generates SELECT, COUNT, and DELETE queries. The chaining of clauses into a graph pattern may be particularly easy to understand in the presence of the graphical representation which SPARQLgraph provides. Note the dropdown menu under the main canvas allows user to select the query type. The middle pane shows generated SPARQL, and the bottom any results.
Click on a data property and observe the dialog's functionality for building FILTER and VALUES clauses. Can you explain how the SPARQL generation engine might need to modify a query in order to query for the Suggest Values function.
Try examples of MINUS and OPTIONAL on data properties.
Clicking on connecting arrows between classes allows for the use of MINUS and OPTIONAL on object properties. Explore the meaning of REVERSE MINUS and REVERSE OPTIONAL. What happens then these are applied to a subtree with multiple class nodes.
Similarly object property arrows allow use of ? + * operators.
DELETE queries. When a class is clicked upon, the resulting dialog lists delete modes. Observe and explain the difference between FULL_DELETE and TYPE_INFO_ONLY
The Ingestion wiki page describes ingesting data and URI lookups.
what are the advantages and disadvantages of human-readable URI's vs using GUIDs.
try adding voltage to battery and ingesting values
perform the above using URI lookup.
this toolkit makes strong assumptions about A-box and T-box. Understand where these assumptions might break down.