Importing Data to Virtuoso - DDMAL/linkedmusic-datalake GitHub Wiki

Overview

This document shows how to use the bulk loader to:

  1. Prepare your host folders and Virtuoso container
  2. Transfer RDF files (e.g., Turtle .ttl) from your local machine to Virtuoso
  3. Create per-folder .graph files to assign each set of TTLs to its own graph IRI
  4. Use isql to register and bulk-load each subfolder into Virtuoso
  5. Verify that both graphs contain data

Table of Contents:

This guide is divided in two sections. The first applies to the production server only. For staging, see the Staging section. Note: As of 9 June 2025, Liam is not sure why the directory structure is different on staging and production. This should be investigated. See #347.

We will be following the process outlined in this guide, which describes how to use Virtuoso's bulk loader for data import. Note that it requires the data to already be in one of the following RDF formats:

For this project, we are currently mainly using .ttl for its general ease-of-use, although other formats have not been extensively tested.

Production

1. SSH into Virtuoso and Verify Docker Permissions

First, SSH into the Virtuoso server (see How to SSH into Virtuoso).

Next, you will need Docker permission to add to the database. Reach out to the Virtuoso admin (Liam) and give him your username. This is the same username you use to SSH into the server. Alternatively, a system admin (Hanwen, Dylan) can add you using the following commands:

sudo usermod -aG docker <USERNAME>
sudo usermod -aG virtuoso-users <USERNAME>

You can verify everything is working by running docker ps.

2. Create and Organize Host Directories

Create a new directory under /srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME> with:

mkdir -p /srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>

Remember to replace <NEW_DATABASE_NAME> with the name of your new database.

3. Transfer RDF Files from Local Machine

Next, from your local machine (i.e., Terminal), use rsync or scp to transfer your RDF files from your local machine to Virtuoso. The data are located in the /srv/virtuoso/my_virtdb/data/ directory. In the Docker container, this folder is visible at /database/data.

rsync -avz -e ssh \
  "LOCAL_PATH_TO_DATA" \
  ddmal.prod_virtuoso:/srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>

Note, you may need to add --rsync-path="sudo rsync" if it says you don't have the right permissions. If this is the case, let an admin know to create an issue for it.

rsync -avz -e ssh \
  "LOCAL_PATH_TO_DATA" \
  --rsync-path="sudo rsync" \
  ddmal.prod_virtuoso:/srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>

Then, confirm it's on the server.

ssh ddmal.prod_virtuoso
ls /srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>

4. Create Per-Folder .graph Files

Virtuoso uses .graph files to assign uploaded data to its target graph IRI.

nano /srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>/global.graph

Then, paste the link to the homepage of the new database. Make sure there are no extra spaces or newlines. For example, for DIAMM, this would be:

https://www.diamm.ac.uk/

When you run the bulk loader, it will by default look for a global.graph file in the same directory as the data and assign the global.graph file's IRI to all data found in that directory. If you would like to override this default, you can create a separate .graph file. For example, if you have SAMPLE_SPECIAL_DATA.ttl, you can make a SAMPLE_SPECIAL_DATA.ttl.graph file and specify a different IRI in that .graph file. Ensure that the names of the files are identical except for the .graph file extension.

5. Run the Bulk Loader

  1. Enter the isql shell. The credentials here are the same as what you use for Virtuoso Conductor on the web. Note that your Virtuoso account will need dba permissions. Contact Liam, Dylan, or Hanwen for access or to make a new account.
docker exec -it my_virtdb isql 1111 <DBA_USERNAME>

Then enter your password.

  1. Register the new data. If you get "Access denied," see Troubleshooting.
DB.DBA.ld_dir(
  '/database/data/<NEW_DATABASE_NAME>',  -- container path
  '*.ttl',                               -- match all TTLs in that folder (replace with desired extension)
  NULL                                   -- NULL ⇒ look for <NEW_DATABASE_NAME>.ttl.graph
);
  1. Run the bulk loader
DB.DBA.rdf_loader_run();
  1. Verify the status of the new database
SELECT ll_file, ll_graph, ll_state, ll_error
  FROM DB.DBA.load_list
 WHERE ll_file LIKE '%/<NEW_DATABASE_NAME>/%';
  1. Commit loaded triples
checkpoint;

6. Add Prefixes for all Entity Types

Add a prefix for each entity type in the database. See this section of the Virtuoso setup guide for instructions.

7. Verify Graph Contents with SPARQL

At the SQL> prompt, run the following. Remember to replace both <NEW_DATABASE_NAME> and <GRAPH_IRI>. Note that the graph IRI should remain enclosed in < > brackets.

SPARQL
  SELECT (COUNT(*) AS ?<NEW_DATABASE_NAME>Count)
    WHERE { GRAPH <<GRAPH_IRI>> { ?s ?p ?o } };

For example, for DIAMM, this would be:

SPARQL
  SELECT (COUNT(*) AS ?diammCount)
    WHERE { GRAPH <https://www.diamm.ac.uk/> { ?s ?p ?o } };

You should see a non-zero number representing the number of triples in your new database.

Staging

1. SSH into Virtuoso and Verify Docker Permissions

First, SSH into the Virtuoso server (see How to SSH into Virtuoso).

Next, you will need Docker permission to add to the database. Reach out to the Virtuoso admin (Liam) and give him your username. This is the same username you use to SSH into the server. Alternatively, a system admin (Hanwen, Dylan) can add you using the following commands:

sudo usermod -aG docker <USERNAME>
sudo usermod -aG virtuoso-users <USERNAME>

You can verify everything is working by running docker ps.

2. Create and Organize Host Directories

Create a new directory under /srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME> with:

mkdir -p /srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>

Remember to replace <NEW_DATABASE_NAME> with the name of your new database.

3. Transfer RDF Files from Local Machine

Next, from your local machine (i.e., Terminal), use rsync or scp to transfer your RDF files from your local machine to Virtuoso. The data are located in the /srv/webapps/virtuoso/my_virtdb/data/ directory. In the Docker container, this folder is visible at /database/data.

rsync -avz -e ssh \
  "LOCAL_PATH_TO_DATA" \
  ddmal.staging_virtuoso:/srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>

Note, you may need to add --rsync-path="sudo rsync" if it says you don't have the right permissions. If this is the case, let an admin know to create an issue for it.

rsync -avz -e ssh \
  "LOCAL_PATH_TO_DATA" \
  --rsync-path="sudo rsync" \
  ddmal.staging_virtuoso:/srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>

Then, confirm it's on the server.

ssh ddmal.staging_virtuoso
ls /srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>

4. Create Per-Folder .graph Files

Virtuoso uses .graph files to assign uploaded data to its target graph IRI.

nano /srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>/global.graph

Then, paste the link to the homepage of the new database. Make sure there are no extra spaces or newlines. For example, for DIAMM, this would be:

https://www.diamm.ac.uk/

When you run the bulk loader, it will by default look for a global.graph file in the same directory as the data and assign the global.graph file's IRI to all data found in that directory. If you would like to override this default, you can create a separate .graph file. For example, if you have SAMPLE_SPECIAL_DATA.ttl, you can make a SAMPLE_SPECIAL_DATA.ttl.graph file and specify a different IRI in that .graph file. Ensure that the names of the files are identical except for the .graph file extension.

5. Run the Bulk Loader

  1. Enter the isql shell. The credentials here are the same as what you use for Virtuoso Conductor on the web. Note that your Virtuoso account will need dba permissions. Contact Liam, Dylan, or Hanwen for access or to make a new account.
docker exec -it my_virtdb isql 1111 <DBA_USERNAME>

Then enter your password.

  1. Register the new data. If you get "Access denied," see Troubleshooting.
DB.DBA.ld_dir(
  '/database/data/<NEW_DATABASE_NAME>',  -- container path
  '*.ttl',                               -- match all TTLs in that folder (replace with desired extension)
  NULL                                   -- NULL ⇒ look for <NEW_DATABASE_NAME>.ttl.graph
);
  1. Run the bulk loader
DB.DBA.rdf_loader_run();
  1. Verify the status of the new database
SELECT ll_file, ll_graph, ll_state, ll_error
  FROM DB.DBA.load_list
 WHERE ll_file LIKE '%/<NEW_DATABASE_NAME>/%';
  1. Commit loaded triples
checkpoint;

6. Add Prefixes for all Entity Types

Add a prefix for each entity type in the database. See this section of the Virtuoso setup guide for instructions.

7. Verify Graph Contents with SPARQL

At the SQL> prompt, run the following. Remember to replace both <NEW_DATABASE_NAME> and <GRAPH_IRI>. Note that the graph IRI should remain enclosed in < > brackets.

SPARQL
  SELECT (COUNT(*) AS ?<NEW_DATABASE_NAME>Count)
    WHERE { GRAPH <<GRAPH_IRI>> { ?s ?p ?o } };

For example, for DIAMM, this would be:

SPARQL
  SELECT (COUNT(*) AS ?diammCount)
    WHERE { GRAPH <https://www.diamm.ac.uk/> { ?s ?p ?o } };

You should see a non-zero number representing the number of triples in your new database.

Troubleshooting

  • "Access denied" when running ld_dir()

This is likely because Virtuoso restricts directories via DirsAllowed. To fix it, you need to modify virtuoso.ini on the host.

Production:

sudo -i
nano /srv/virtuoso/my_virtdb/virtuoso.ini

Staging:

sudo -i
nano /srv/webapps/virtuoso/my_virtdb/virtuoso.ini

Find the [Parameters] section and make sure it has:

[Parameters]
  ...
  DirsAllowed = ., ../vad, /usr/share/proj, /database, /database/data
  ...

Save (Ctrl+O, Enter, Ctrl+X) and restart the Docker container:

sudo docker restart my_virtdb

If this still doesn't work, try whitelisting my_virtdb instead:

[Parameters]
  ...
  DirsAllowed = ., ../vad, /usr/share/proj, /`my_virtdb`, /`my_virtdb`/data
  ...
  • Number of triples is zero when running DB.DBA.load_list

This could happen for a number of reasons. However, it means that no RDF files were registered. Double check that your .graph files are not missing, misnamed, or in the wrong place. The filenames must either be <NEW_DATABASE_NAME>.<EXTENSION>.graph (i.e., musicbrainz.ttl.graph) for per-file .graph loading, or you must have a single global.graph file in the root directory for your new database.

Future Work

  • There is a way to load .csv files instead of RDF directly. However, as of 4 June 2025, Liam has not investigated how to do this (see #294 and #319).

  • We should find a way to do the entire data import process without needing sudo so that less-experienced users can still upload data to Virtuoso (see #317). For Docker permissions, we may be able to use Docker groups. a dba admin user (Liam, Dylan, Hanwen) can make a Docker group, then add users with sudo usermod -aG docker <username>.

⚠️ **GitHub.com Fallback** ⚠️