Example: setting up a Linked Data Fragments experiment - rubensworks/jbr.js GitHub Wiki

This guide gives a quick example of how jbr can be used to initialize, prepare, and run an experiment.

0. Install

Requirements:

  • Node.js (1.12 or higher)
  • Docker (required for starting Docker containers)

If you haven't done this yet, make sure to install jbr as follows:

$ npm install -g jbr

1. Initialize

Let's create an experiment for measuring the performance of Comunica over an LDF server (with NGINX cache) using the WatDiv benchmark:

$ jbr init watdiv ldf-performance

Initialized new experiment in /Users/rtaelman/experiments/ldf-performance

This experiment requires handlers for the following hooks before it can be used:
  - hookSparqlEndpoint
Initialize these hooks by calling 'jbr set-hook <hook> <handler>'

✨ Done in 11.98s

After executing this command, the ldf-performance directory will have been created, so let's navigate to it:

$ cd ldf-performance

It is highly recommend to add your experiment to a git repository and place it on a platform such as GitHub:

$ git init
$ git add .
$ git commit -m "Initial commit"

Note that a .gitignore has been created for you, so it is safe to just add all files to the git repo.

2. Set hooks

The output of the init command told us that we still need to configure a handler for the hookSparqlEndpoint hook, which will be the SPARQL endpoint our benchmark targets.

Let's plug in a hook for the LDF-based engine:

$ jbr set-hook hookSparqlEndpoint sparql-endpoint-ldf

Handler 'sparql-endpoint-ldf' has been set for hook 'hookSparqlEndpoint' in experiment 'ldf-performance'

This hook requires the following sub-hooks before it can be used:
  - hookSparqlEndpoint/hookSparqlEndpointLdfEngine
Initialize these hooks by calling 'jbr set-hook <hook> <handler>'

✨ Done in 2.80s

As the output shows, this hook requires another sub-hook, which is needed to execute SPARQL queries over the Triple Pattern Fragments interface of the LDF server.

For this, we can use the Comunica engine:

$ jbr set-hook hookSparqlEndpoint/hookSparqlEndpointLdfEngine sparql-endpoint-comunica

Handler 'sparql-endpoint-comunica' has been set for hook 'hookSparqlEndpoint/hookSparqlEndpointLdfEngine' in experiment 'ldf-performance'

3. Tweak experiment configuration

At this stage, your experiment is nearly executable. Optionally, you can configure your experimental setup by modifying the jbr-experiment.json file, which looks as follows:

{
  "@context": [
    "https://linkedsoftwaredependencies.org/bundles/npm/jbr/^0.0.0/components/context.jsonld",
    "https://linkedsoftwaredependencies.org/bundles/npm/@jbr-experiment/watdiv/^0.0.0/components/context.jsonld",
    "https://linkedsoftwaredependencies.org/bundles/npm/@jbr-hook/sparql-endpoint-comunica/^0.0.0/components/context.jsonld",
    "https://linkedsoftwaredependencies.org/bundles/npm/@jbr-hook/sparql-endpoint-ldf/^0.0.0/components/context.jsonld"
  ],
  "@id": "urn:jrb:comunica-performance",
  "@type": "ExperimentWatDiv",
  "datasetScale": 1,
  "queryCount": 5,
  "queryRecurrence": 1,
  "generateHdt": true,
  "endpointUrl": "http://localhost:3001/sparql",
  "queryRunnerReplication": 3,
  "queryRunnerWarmupRounds": 1,
  "queryRunnerRecordTimestamps": true,
  "hookSparqlEndpoint": {
    "@id": "urn:jrb:comunica-performance:hookSparqlEndpoint",
    "@type": "HookSparqlEndpointLdf",
    "dockerfile": "input/dockerfiles/Dockerfile-ldf-server",
    "dockerfileCache": "input/dockerfiles/Dockerfile-ldf-server-cache",
    "resourceConstraints": {
      "@type": "StaticDockerResourceConstraints",
      "cpu_percentage": 100
    },
    "config": "input/config-ldf-server.json",
    "portServer": 2999,
    "portCache": 3000,
    "workers": 4,
    "maxMemory": 8192,
    "dataset": "generated/dataset.hdt",
    "hookSparqlEndpointLdfEngine": {
      "@id": "urn:jrb:comunica-performance:hookSparqlEndpoint_hookSparqlEndpointLdfEngine",
      "@type": "HookSparqlEndpointComunica",
      "dockerfileClient": "input/dockerfiles/Dockerfile-client",
      "resourceConstraints": {
        "@type": "StaticDockerResourceConstraints",
        "cpu_percentage": 100
      },
      "configClient": "input/config-client.json",
      "clientPort": 3001,
      "clientLogLevel": "info",
      "queryTimeout": 300,
      "maxMemory": 8192
    }
  }
}

For example, you can increase the WatDiv dataset size ten-fold by setting its value to 10.

More fine-grained configurations can be altered by modifying the files within the input/ directory. More information on the meaning of these configurations can be found within the README files of the respective experiment types and hook types.

What we MUST still do is setup the connection of the Comunica engine with the LDF server. For this, we need to set the contents of input/context-client.json as follows:

{
  "sources": [ "http://cache/dataset" ]
}

This will make sure that all queries that Comunica receives make use of the (cache proxy over the) LDF server (exposing a single dataset).

4. Prepare

Before we can actually run the experiment, we must first trigger the prepare phase, which will make sure that the experiment has been fully prepared. In this case, this will make sure that the WatDiv dataset and queries are generated.

$ jbr prepare

🧩 Preparing experiment combination 0
Building LDF server Docker image
Building LDF server cache Docker image
Preparing LDF engine hook
Generating WatDiv dataset and queries
Converting WatDiv dataset to HDT
✨ Done in 15.04s

Once that is done, your generated/ directory will now contain the following files:

generated/
  dataset.hdt
  dataset.hdt.index.v1-1
  dataset.nt
  queries

5. Run

Once our experimental setup has been finalized, we can run the actual experiment as follows:

$ jbr run

Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Warming up for 1 rounds
Executed all queries for iteration 1/1"
Executing 20 queries with replication 3
Executed all queries for iteration 3/3"
Writing results to /Users/rtaelman/experiments/ldf-performance/output

✨ Done in 154.79s

The most important output file is output/query-times.csv, which contains the execution time for each query. Aside from this file, all logs and stats (CPU, memory, I/O) is also available within the output/ directory.

The output/ will look as follows:

output/
  query-times.csv
  stats-sparql-endpoint-comunica.csv
  stats-sparql-endpoint-ldf-cache.csv
  stats-sparql-endpoint-ldf-server.csv
  logs/
    sparql-endpoint-comunica.txt
    sparql-endpoint-ldf-cache.txt
    sparql-endpoint-ldf-server.txt
    watdiv-generation.txt
    watdiv-hdt-index.txt
    watdiv-hdt.txt

These files can be processed by any tool.

One possible tool to process these is psbr, using which TikZ-based plots can be created using psbr tex query output/, which will create output such as:

Example output

⚠️ **GitHub.com Fallback** ⚠️