Importing Data to Virtuoso - DDMAL/linkedmusic-datalake GitHub Wiki
This document shows how to use the bulk loader to:
- Prepare your host folders and Virtuoso container
- Transfer RDF files (e.g., Turtle
.ttl
) from your local machine to Virtuoso - Create per-folder
.graph
files to assign each set of TTLs to its own graph IRI - Use
isql
to register and bulk-load each subfolder into Virtuoso - Verify that both graphs contain data
This guide is divided in two sections. The first applies to the production server only. For staging, see the Staging section. Note: As of 9 June 2025, Liam is not sure why the directory structure is different on staging and production. This should be investigated. See #347.
We will be following the process outlined in this guide, which describes how to use Virtuoso's bulk loader for data import. Note that it requires the data to already be in one of the following RDF formats:
-
.grdf
– Geospatial RDF -
.nq
– N-Quads -
.nt
– N-Triples -
.owl
– OWL -
.rdf
– RDF/XML -
.trig
– TriG -
.ttl
– Turtle -
.xml
– RDF/XML
For this project, we are currently mainly using .ttl
for its general ease-of-use, although other formats have not been extensively tested.
First, SSH into the Virtuoso server (see How to SSH into Virtuoso).
Next, you will need Docker permission to add to the database. Reach out to the Virtuoso admin (Liam) and give him your username. This is the same username you use to SSH into the server. Alternatively, a system admin (Hanwen, Dylan) can add you using the following commands:
sudo usermod -aG docker <USERNAME>
sudo usermod -aG virtuoso-users <USERNAME>
You can verify everything is working by running docker ps
.
Create a new directory under /srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
with:
mkdir -p /srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
Remember to replace <NEW_DATABASE_NAME> with the name of your new database.
Next, from your local machine (i.e., Terminal), use rsync
or scp
to transfer your RDF files from your local machine to Virtuoso. The data are located in the /srv/virtuoso/my_virtdb/data/
directory. In the Docker container, this folder is visible at /database/data
.
rsync -avz -e ssh \
"LOCAL_PATH_TO_DATA" \
ddmal.prod_virtuoso:/srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
Note, you may need to add --rsync-path="sudo rsync"
if it says you don't have the right permissions. If this is the case, let an admin know to create an issue for it.
rsync -avz -e ssh \
"LOCAL_PATH_TO_DATA" \
--rsync-path="sudo rsync" \
ddmal.prod_virtuoso:/srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
Then, confirm it's on the server.
ssh ddmal.prod_virtuoso
ls /srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
Virtuoso uses .graph
files to assign uploaded data to its target graph IRI.
nano /srv/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>/global.graph
Then, paste the link to the homepage of the new database. Make sure there are no extra spaces or newlines. For example, for DIAMM, this would be:
https://www.diamm.ac.uk/
When you run the bulk loader, it will by default look for a global.graph
file in the same directory as the data and assign the global.graph
file's IRI to all data found in that directory. If you would like to override this default, you can create a separate .graph
file. For example, if you have SAMPLE_SPECIAL_DATA.ttl
, you can make a SAMPLE_SPECIAL_DATA.ttl.graph
file and specify a different IRI in that .graph
file. Ensure that the names of the files are identical except for the .graph
file extension.
- Enter the
isql
shell. The credentials here are the same as what you use for Virtuoso Conductor on the web. Note that your Virtuoso account will needdba
permissions. Contact Liam, Dylan, or Hanwen for access or to make a new account.
docker exec -it my_virtdb isql 1111 <DBA_USERNAME>
Then enter your password.
- Register the new data. If you get "Access denied," see Troubleshooting.
DB.DBA.ld_dir(
'/database/data/<NEW_DATABASE_NAME>', -- container path
'*.ttl', -- match all TTLs in that folder (replace with desired extension)
NULL -- NULL ⇒ look for <NEW_DATABASE_NAME>.ttl.graph
);
- Run the bulk loader
DB.DBA.rdf_loader_run();
- Verify the status of the new database
SELECT ll_file, ll_graph, ll_state, ll_error
FROM DB.DBA.load_list
WHERE ll_file LIKE '%/<NEW_DATABASE_NAME>/%';
- Commit loaded triples
checkpoint;
Add a prefix for each entity type in the database. See this section of the Virtuoso setup guide for instructions.
At the SQL>
prompt, run the following. Remember to replace both <NEW_DATABASE_NAME>
and <GRAPH_IRI>
. Note that the graph IRI should remain enclosed in < >
brackets.
SPARQL
SELECT (COUNT(*) AS ?<NEW_DATABASE_NAME>Count)
WHERE { GRAPH <<GRAPH_IRI>> { ?s ?p ?o } };
For example, for DIAMM, this would be:
SPARQL
SELECT (COUNT(*) AS ?diammCount)
WHERE { GRAPH <https://www.diamm.ac.uk/> { ?s ?p ?o } };
You should see a non-zero number representing the number of triples in your new database.
First, SSH into the Virtuoso server (see How to SSH into Virtuoso).
Next, you will need Docker permission to add to the database. Reach out to the Virtuoso admin (Liam) and give him your username. This is the same username you use to SSH into the server. Alternatively, a system admin (Hanwen, Dylan) can add you using the following commands:
sudo usermod -aG docker <USERNAME>
sudo usermod -aG virtuoso-users <USERNAME>
You can verify everything is working by running docker ps
.
Create a new directory under /srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
with:
mkdir -p /srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
Remember to replace <NEW_DATABASE_NAME> with the name of your new database.
Next, from your local machine (i.e., Terminal), use rsync
or scp
to transfer your RDF files from your local machine to Virtuoso. The data are located in the /srv/webapps/virtuoso/my_virtdb/data/
directory. In the Docker container, this folder is visible at /database/data
.
rsync -avz -e ssh \
"LOCAL_PATH_TO_DATA" \
ddmal.staging_virtuoso:/srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
Note, you may need to add --rsync-path="sudo rsync"
if it says you don't have the right permissions. If this is the case, let an admin know to create an issue for it.
rsync -avz -e ssh \
"LOCAL_PATH_TO_DATA" \
--rsync-path="sudo rsync" \
ddmal.staging_virtuoso:/srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
Then, confirm it's on the server.
ssh ddmal.staging_virtuoso
ls /srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>
Virtuoso uses .graph
files to assign uploaded data to its target graph IRI.
nano /srv/webapps/virtuoso/my_virtdb/data/<NEW_DATABASE_NAME>/global.graph
Then, paste the link to the homepage of the new database. Make sure there are no extra spaces or newlines. For example, for DIAMM, this would be:
https://www.diamm.ac.uk/
When you run the bulk loader, it will by default look for a global.graph
file in the same directory as the data and assign the global.graph
file's IRI to all data found in that directory. If you would like to override this default, you can create a separate .graph
file. For example, if you have SAMPLE_SPECIAL_DATA.ttl
, you can make a SAMPLE_SPECIAL_DATA.ttl.graph
file and specify a different IRI in that .graph
file. Ensure that the names of the files are identical except for the .graph
file extension.
- Enter the
isql
shell. The credentials here are the same as what you use for Virtuoso Conductor on the web. Note that your Virtuoso account will needdba
permissions. Contact Liam, Dylan, or Hanwen for access or to make a new account.
docker exec -it my_virtdb isql 1111 <DBA_USERNAME>
Then enter your password.
- Register the new data. If you get "Access denied," see Troubleshooting.
DB.DBA.ld_dir(
'/database/data/<NEW_DATABASE_NAME>', -- container path
'*.ttl', -- match all TTLs in that folder (replace with desired extension)
NULL -- NULL ⇒ look for <NEW_DATABASE_NAME>.ttl.graph
);
- Run the bulk loader
DB.DBA.rdf_loader_run();
- Verify the status of the new database
SELECT ll_file, ll_graph, ll_state, ll_error
FROM DB.DBA.load_list
WHERE ll_file LIKE '%/<NEW_DATABASE_NAME>/%';
- Commit loaded triples
checkpoint;
Add a prefix for each entity type in the database. See this section of the Virtuoso setup guide for instructions.
At the SQL>
prompt, run the following. Remember to replace both <NEW_DATABASE_NAME>
and <GRAPH_IRI>
. Note that the graph IRI should remain enclosed in < >
brackets.
SPARQL
SELECT (COUNT(*) AS ?<NEW_DATABASE_NAME>Count)
WHERE { GRAPH <<GRAPH_IRI>> { ?s ?p ?o } };
For example, for DIAMM, this would be:
SPARQL
SELECT (COUNT(*) AS ?diammCount)
WHERE { GRAPH <https://www.diamm.ac.uk/> { ?s ?p ?o } };
You should see a non-zero number representing the number of triples in your new database.
This is likely because Virtuoso restricts directories via DirsAllowed
. To fix it, you need to modify virtuoso.ini
on the host.
sudo -i
nano /srv/virtuoso/my_virtdb/virtuoso.ini
sudo -i
nano /srv/webapps/virtuoso/my_virtdb/virtuoso.ini
Find the [Parameters]
section and make sure it has:
[Parameters]
...
DirsAllowed = ., ../vad, /usr/share/proj, /database, /database/data
...
Save (Ctrl+O
, Enter
, Ctrl+X
) and restart the Docker container:
sudo docker restart my_virtdb
If this still doesn't work, try whitelisting my_virtdb
instead:
[Parameters]
...
DirsAllowed = ., ../vad, /usr/share/proj, /`my_virtdb`, /`my_virtdb`/data
...
This could happen for a number of reasons. However, it means that no RDF files were registered. Double check that your .graph
files are not missing, misnamed, or in the wrong place. The filenames must either be <NEW_DATABASE_NAME>.<EXTENSION>.graph
(i.e., musicbrainz.ttl.graph
) for per-file .graph
loading, or you must have a single global.graph
file in the root directory for your new database.
-
There is a way to load
.csv
files instead of RDF directly. However, as of 4 June 2025, Liam has not investigated how to do this (see #294 and #319). -
We should find a way to do the entire data import process without needing
sudo
so that less-experienced users can still upload data to Virtuoso (see #317). For Docker permissions, we may be able to use Docker groups. adba
admin user (Liam, Dylan, Hanwen) can make a Docker group, then add users withsudo usermod -aG docker <username>
.