Example: setting up a Linked Data Fragments experiment - rubensworks/jbr.js GitHub Wiki
This guide gives a quick example of how jbr can be used to initialize, prepare, and run an experiment.
Requirements:
If you haven't done this yet, make sure to install jbr as follows:
$ npm install -g jbr
Let's create an experiment for measuring the performance of Comunica over an LDF server (with NGINX cache) using the WatDiv benchmark:
$ jbr init watdiv ldf-performance
Initialized new experiment in /Users/rtaelman/experiments/ldf-performance
This experiment requires handlers for the following hooks before it can be used:
- hookSparqlEndpoint
Initialize these hooks by calling 'jbr set-hook <hook> <handler>'
✨ Done in 11.98s
After executing this command, the ldf-performance
directory will have been created, so let's navigate to it:
$ cd ldf-performance
It is highly recommend to add your experiment to a git repository and place it on a platform such as GitHub:
$ git init
$ git add .
$ git commit -m "Initial commit"
Note that a .gitignore
has been created for you, so it is safe to just add all files to the git repo.
The output of the init command told us that we still need to configure a handler for the hookSparqlEndpoint
hook,
which will be the SPARQL endpoint our benchmark targets.
Let's plug in a hook for the LDF-based engine:
$ jbr set-hook hookSparqlEndpoint sparql-endpoint-ldf
Handler 'sparql-endpoint-ldf' has been set for hook 'hookSparqlEndpoint' in experiment 'ldf-performance'
This hook requires the following sub-hooks before it can be used:
- hookSparqlEndpoint/hookSparqlEndpointLdfEngine
Initialize these hooks by calling 'jbr set-hook <hook> <handler>'
✨ Done in 2.80s
As the output shows, this hook requires another sub-hook, which is needed to execute SPARQL queries over the Triple Pattern Fragments interface of the LDF server.
For this, we can use the Comunica engine:
$ jbr set-hook hookSparqlEndpoint/hookSparqlEndpointLdfEngine sparql-endpoint-comunica
Handler 'sparql-endpoint-comunica' has been set for hook 'hookSparqlEndpoint/hookSparqlEndpointLdfEngine' in experiment 'ldf-performance'
At this stage, your experiment is nearly executable.
Optionally, you can configure your experimental setup by modifying the jbr-experiment.json
file, which looks as follows:
{
"@context": [
"https://linkedsoftwaredependencies.org/bundles/npm/jbr/^0.0.0/components/context.jsonld",
"https://linkedsoftwaredependencies.org/bundles/npm/@jbr-experiment/watdiv/^0.0.0/components/context.jsonld",
"https://linkedsoftwaredependencies.org/bundles/npm/@jbr-hook/sparql-endpoint-comunica/^0.0.0/components/context.jsonld",
"https://linkedsoftwaredependencies.org/bundles/npm/@jbr-hook/sparql-endpoint-ldf/^0.0.0/components/context.jsonld"
],
"@id": "urn:jrb:comunica-performance",
"@type": "ExperimentWatDiv",
"datasetScale": 1,
"queryCount": 5,
"queryRecurrence": 1,
"generateHdt": true,
"endpointUrl": "http://localhost:3001/sparql",
"queryRunnerReplication": 3,
"queryRunnerWarmupRounds": 1,
"queryRunnerRecordTimestamps": true,
"hookSparqlEndpoint": {
"@id": "urn:jrb:comunica-performance:hookSparqlEndpoint",
"@type": "HookSparqlEndpointLdf",
"dockerfile": "input/dockerfiles/Dockerfile-ldf-server",
"dockerfileCache": "input/dockerfiles/Dockerfile-ldf-server-cache",
"resourceConstraints": {
"@type": "StaticDockerResourceConstraints",
"cpu_percentage": 100
},
"config": "input/config-ldf-server.json",
"portServer": 2999,
"portCache": 3000,
"workers": 4,
"maxMemory": 8192,
"dataset": "generated/dataset.hdt",
"hookSparqlEndpointLdfEngine": {
"@id": "urn:jrb:comunica-performance:hookSparqlEndpoint_hookSparqlEndpointLdfEngine",
"@type": "HookSparqlEndpointComunica",
"dockerfileClient": "input/dockerfiles/Dockerfile-client",
"resourceConstraints": {
"@type": "StaticDockerResourceConstraints",
"cpu_percentage": 100
},
"configClient": "input/config-client.json",
"clientPort": 3001,
"clientLogLevel": "info",
"queryTimeout": 300,
"maxMemory": 8192
}
}
}
For example, you can increase the WatDiv dataset size ten-fold by setting its value to 10.
More fine-grained configurations can be altered by modifying the files within the input/
directory.
More information on the meaning of these configurations can be found within the README files of the respective experiment types and hook types.
What we MUST still do is setup the connection of the Comunica engine with the LDF server.
For this, we need to set the contents of input/context-client.json
as follows:
{
"sources": [ "http://cache/dataset" ]
}
This will make sure that all queries that Comunica receives make use of the (cache proxy over the) LDF server (exposing a single dataset
).
Before we can actually run the experiment, we must first trigger the prepare phase, which will make sure that the experiment has been fully prepared. In this case, this will make sure that the WatDiv dataset and queries are generated.
$ jbr prepare
🧩 Preparing experiment combination 0
Building LDF server Docker image
Building LDF server cache Docker image
Preparing LDF engine hook
Generating WatDiv dataset and queries
Converting WatDiv dataset to HDT
✨ Done in 15.04s
Once that is done, your generated/
directory will now contain the following files:
generated/
dataset.hdt
dataset.hdt.index.v1-1
dataset.nt
queries
Once our experimental setup has been finalized, we can run the actual experiment as follows:
$ jbr run
Endpoint not available yet, waiting for 1 second
Endpoint not available yet, waiting for 1 second
Warming up for 1 rounds
Executed all queries for iteration 1/1"
Executing 20 queries with replication 3
Executed all queries for iteration 3/3"
Writing results to /Users/rtaelman/experiments/ldf-performance/output
✨ Done in 154.79s
The most important output file is output/query-times.csv
, which contains the execution time for each query.
Aside from this file, all logs and stats (CPU, memory, I/O) is also available within the output/
directory.
The output/
will look as follows:
output/
query-times.csv
stats-sparql-endpoint-comunica.csv
stats-sparql-endpoint-ldf-cache.csv
stats-sparql-endpoint-ldf-server.csv
logs/
sparql-endpoint-comunica.txt
sparql-endpoint-ldf-cache.txt
sparql-endpoint-ldf-server.txt
watdiv-generation.txt
watdiv-hdt-index.txt
watdiv-hdt.txt
These files can be processed by any tool.
One possible tool to process these is psbr
, using which TikZ-based plots can be created using psbr tex query output/
, which will create output such as: