Hello World SPARQLgraph - ge-semtk/semtk GitHub Wiki

This page contains a tutorial that takes you through many of the main features of SPARQLgraph in 15 to 30 minutes.

Bring up SPARQLgraph

Start by opening the SPARQLgraph demo site in a new tab: semtk.research.ge.com

SPARQLgraph URL parameters

SPARQLgraph accepts these URL parameters:

  • conn - the JSON rendering of a connection string to load or override in the nodegroup
  • nodegroupId - the name of a nodegroup to load and, by default, execute the query
  • runFlag - if False, then do not run nodegroupId
  • constraints - the JSON rendering of runtime constraints which match the nodegroup
  • reportId - load this report when starting SPARQLgraph
  • exploreRestrictions - launch restrictions exploring page

The semtk-python3 project provides a get_sparqlgraph_url() function which will build URLs with the JSON parameters properly formed.

Connect to demo virtuoso triple-store

The SPARQLgraph demo site includes a running instance of Virtuoso

This site is for temporary storage only. It is refreshed daily at 18:00 UTC

Connect SPARQLgraph to the demo virtuoso server by pulling the File->Load menu.

  1. Hit the "New" button, and "OK" on the message dialog which will appear if you do not yet have any other connection information.

  2. Enter these values

    • Name: "Batteries" or some related name
    • Enable OWL Imports can remain checked
    • Server URL: http://localhost:2420 This tells SPARQLgraph that virtuoso is on the same server as the SPARQLgraph services.
    • Type: "Virtuoso"
    • Dataset: make something up It should be unique so you don't collide with other users. This is a demo server, so no guarantees! e.g. http://battery_demo/something/unique Note that the protocol and ':' are required.

Now hit Submit.

You've connected to virtuoso with a named graph.

You should get a message

Top-level class query returned no rows. Dataset is empty.

This means you have no model loaded yet.

Load a model

SPARQLgraph depends on your data being modeled. We love SADL, but you can use web protege or even plain OWL.

Our sampleBattery.sadl file looks like this:

uri "http://kdl.ge.com/batterydemo" alias changeme version "$Revision:$ Last modified on   $Date:$". 

Color is a class, must be one of {red, white, blue}.

Cell is a class, 
	described by cellId with a single value of type string,
	described by color with a single value of type Color.
	
Battery is a class,
	described by name with a single value of type string,
	described by cell with values of type Cell.	 

This SADL is compiled into an OWL model file. A copy can be found in sampleBattery.owl

To load this model:

Continue:

  • drag the owl file to "Drop OWL file"
  • hit the "Upload Owl" button.

Now a Reload button will appear. Press it, then use the 'Query' button at the top to go back to the query tab.

Build a simple query

Back on the query tab, hit the Expand button or click through the tree to open and close items. You'll see the model that you just uploaded.

Drag Battery to the canvas on the right. Drag Color. Separate the boxes so you can see them all clearly. You'll notice that path-finding determined that the only path from Battery to Color passed through Cell so that class was added to the query.

Click on each of these, and hit Submit

  • ?Battery
  • ?Battery : name
  • ?Cell
  • ?Cell : cellId
  • ?Color

If you accidentally click "?Battery : Cell" or "?Cell : color" and create new nodes, simply click the black x to clear them.

All of these have been added to a "Select distinct" query. When you hit the Execute button, you'll see that that results are empty. No data has been loaded yet. The Query section will show the text of the query generated.

Load data

Now go to the Map Input tab to begin loading data. Notice that there is a row on the left for each possible value that could be loaded into the nodegroup you created on the Query tab.

Download sampleBattery.csv. It contains some data that looks like this:

battery name, cell id, color
battery1,A20,red
battery1,A21,white
battery1,A22,blue
battery2,,
blue one,B100,blue
blue one,B101,blue

Drag your downloaded file to Drop CSV file and all the column headers will appear on the right.

Create a Base URI

Open Options and enter "http://data" in the "Base URI" field

Create some text fields

Click on +New and create two text fields:

  • "Battery_"
  • "Cell_"

These will be used in a moment to help name our battery and cell data.

Create a Transform

Click on +New to create a transform:

  • Name: RmSpace
  • Type: Replace All
  • search: \s
  • replace _

This will be used to change spaces to underscores in some of the incoming data.

Build the mapping

Drag the columns, texts, and transform to the left until your screen looks like the picture below. Note that all items are dragged from right to left into the rows provided for each element of the nodegroup. The "RMSPACE" transform should be dropped directly on top of "BATTERY NAME".

Import the data

Return to the Import tab. At the bottom, the nodegroup will be populated from your query, and the data file will be populated from the Mapping tab activities. You can simply hit the Import button.

Now return to the Query tab and hit the Execute button at the top. Your data will appear at the bottom in the Results section.

Save

Use Nodegroup->download to save your work. The resulting json file can be dragged back onto the canvas, and the following will be restored:

  • connection to virtuoso
  • query
  • import mapping

Your data, of course, is stored in virtuoso. Remember that on the demo virtuoso server it is removed daily.

Explore import results

Make sure that the show namespace button is checked when you hit the Execute button, and look carefully at the data that was returned.

The following highlight important ingestion features:

  • all data URIs start with "http://data", which you entered at the top of the Map Input tab.
  • the RmSpace transform has changed the spaces in the battery name "battery one" before using it in the Battery URI.
  • the ingestion service looked up the proper URIs for colors based on the short color name. This was possible since Color was described by "must be one of" in SADL. Notice that the prefixes on the URIs in the Color column correspond to the model, not the data.
  • battery2 does not show up in the query results

Why is a battery missing

The ingestion process attempts to insert a representation of the entire nodegroup shown on the Query tab for each line of input data. However, when empty cells are found in the input data, triples are not generated. If a sub-tree has no triples, then it is pruned before the SPARQL INSERT query. This is what happened to battery2. Since there is no cell or color, only two triples were generated:

You can verify that battery2 was indeed ingested by removing the Cell and Color nodes from the query by clicking the black boxes, and re-executing the query.

What next

Although the instance of virtuoso running on the demo server is refreshed daily and not intended for long-term use, SPARQLgraph is maintained and available for public use.

A preferred way to use SPARQLgraph is to set up your own instance of Virtuoso, and use SPARQLgraph in place. It's Javascript front end runs on your computer. However, queries are performed through Java services running on the SPARQLgraph server, so your instance of Virtuoso must be visible from the public internet.

If you would like to point SPARQLgraph at a private dataset, or if you would like to contribute to the project, you can clone this repository and follow the [installation instructions](Docs installing) to set up your own local copy of SPARQLgraph.