Virtuoso Setup Guide - DDMAL/linkedmusic-datalake GitHub Wiki
This guide details how to get Virtuoso up and running on the server or on a local machine.
When setting up locally, it is not worth it to go through all the configuration/setup steps. Instead follow this shorter list:
-
Move to your home folder (
cd ~
) and follow the set up docker step, you can keep the dba password tomysecret
since you're the only one with access to the machine. -
I would recommend only changing the following Virtuoso settings
All settings are located in the virtuoso.ini
file, in the my_virtdb
folder. You can edit it via the command line (with vim
) or with any text editor like VSCode.
- Add
/database
and/database/data
to theDirsAllowed
setting - Set both buffer settings to ~85% of the recommended value for the amount of RAM that docker has (8GB by default on Mac), and ensure you comment out the default settings for those (with a
;
) - If you're getting issues with the estimated time for queries, comment out the
MaxQueryCostEstimationTime
setting since you don't care about long queries since it's not a production setting
Once you're made changes to the file, restart Virtuoso (docker restart my_virtdb
) so that they take effect.
- Log into the isql shell (
docker exec -it my_virtdb isql -U dba -P mysecret
) and run the following in the iSQL shell:
DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('SPARQL', 7);
DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('nobody', 7);
-- For federated SPARQL query search, see https://community.openlinksw.com/t/sparql-federated-query/4162/4
grant execute on "DB.DBA.SPARQL_SINV_IMP" to "SPARQL";
grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL";
grant SPARQL_SELECT to "SPARQL";
grant SPARQL_SELECT_FED to "SPARQL";
- Follow the adding data step to import the data.
To download the ttl file(s) and the global.graph file from the Virtuoso server, run the following command from the data
folder on your local machine:
rsync -rtvz -e ssh ddmal.prod_virtuoso:/srv/virtuoso/my_virtdb/data/<database_name> .
Do not put a trailing slash after the name of the folder. As an example, for diamm:
rsync -rtvz -e ssh ddmal.prod_virtuoso:/srv/virtuoso/my_virtdb/data/diamm .
- For prefixes, I'd recommend only setting up
wd:
,wdt:
,wikibase:
and the main database ones (so onlydiamm:
for diamm for example). See this section for instructions on how to do that
If you have a local instance of Virtuoso that you want to forward to the staging server so that others can access it, follow these steps:
-
Shut down the docker container on the staging server (
docker stop my_virtdb
) -
Ensure that ssh settings are correct on the server.
On the server, modify the /etc/ssh/sshd_config
file to ensure that the following lines are present. You will need to use sudo vim
(or your text editor of choice) to modify the file.
GatewayPorts yes
AllowTcpForwarding yes
If you modified the file, run sudo systemctl reload sshd
to update the configuration.
- Forward the server's port 8890 to your local machine's port 8890 (Virtuoso's HTTP server)
On your machine, run the following command to forward the server's port 8890 to your machine's port 8890. This is what allows others to access the Virtuoso server.
ssh -N -f -R 0.0.0.0:8890:localhost:8890 ddmal.staging_virtuoso
This command will run the reverse SSH tunnel in the background. To stop the tunnel, first run ps aux | grep 'ssh -N'
to find the PID of the tunnel. It will be the value in the second column. Once you have the PID for the SSH process, run kill <PID>
to stop it.
The staging server (https://virtuoso.staging.simssa.ca) was setup according to the instructions bellow. For information on the server itself, see the DDMAL internal Wiki.
(official Virtuoso Docker setup guide here)
-
Pull the docker image (line 1) and check the image version (optional, line 2).
sudo docker pull openlink/virtuoso-opensource-7 sudo docker run openlink/virtuoso-opensource-7 version
-
Start a docker container.
sudo mkdir my_virtdb cd my_virtdb sudo docker run \ --name my_virtdb \ --detach \ --env DBA_PASSWORD=mysecret \ --publish 1111:1111 \ --publish 8890:8890 \ --volume "$(pwd)":/database \ openlink/virtuoso-opensource-7:latest
This creates a new Virtuoso database in the my_virtdb
subdirectory and starts a Virtuoso instance with the HTTP server listening on port 8890
and the ISQL
data server listening on port 1111
.
Note that you should change the DBA_PASSWORD
to the desired password.
- Go to the local server
http://localhost:8890/
. Log into conductor using
username: dba
password: mysecret
-
Go to
System Admin
>Packages
. Downloadconductor
,fct
,iSPARQL
,rdf_mappers
(downloadrdf_mappers
[here](http://download3.openlinksw.com/uda/vad-vos-packages/7.2/rdf_mappers_dav.vad) and install from upload). You can find the rest of the packages here if not previously installed. -
Check if faceted search works here
http://localhost:8890/fct/
. Try SPARQL herehttp://localhost:8890/sparql/
. -
Configure data and permissions.
Open the
ISQL
CLI:-- Permission for Sponging (optional) -- see https://github.com/openlink/virtuoso-opensource/issues/1180 DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('SPARQL', 7); DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('nobody', 7); -- Post Installation Setup for Virtuoso Faceted Browser -- see: https://vos.openlinksw.com/owiki/wiki/VOS/VirtFacetBrowserInstallConfig#Post%20Installation RDF_OBJ_FT_RULE_ADD (null, null, 'All'); VT_INC_INDEX_DB_DBA_RDF_OBJ (); urilbl_ac_init_db(); s_rank(); -- For federated SPARQL query search, see https://community.openlinksw.com/t/sparql-federated-query/4162/4 grant execute on "DB.DBA.SPARQL_SINV_IMP" to "SPARQL"; grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL"; grant SPARQL_SELECT to "SPARQL"; grant SPARQL_SELECT_FED to "SPARQL";
Note: Make sure to rerun these lines after loading a new JSON-LD (for text indexing and entity label table)
VT_INC_INDEX_DB_DBA_RDF_OBJ ();
urilbl_ac_init_db();
This can be done before or after the configuration.
- Create a data folder
While in the my_virtdb
directory, run the following command to create a directory in which you'll put the data:
mkdir data
- Import the data into Virtuoso
Follow the Importing and Updating Data on Virtuoso Guide to import data into Virtuoso's database.
From Conductor, navigate to "Linked Data">"Namespaces" to view the list of configured prefixes and to add your own, adding for example these prefixes for Wikidata:
wd:
http://www.wikidata.org/entity/
wdt:
http://www.wikidata.org/prop/direct/
When adding prefixes on the webpage, do not include the :
in the prefix name, and do not include the angle brackets (<>
) in the URI.
Once you have added all prefixes, run a checkpoint. To do this, log into the isql
prompt and run the checkpoint;
command. Without the checkpoint command, the changes might not be properly saved next time Virtuoso restarts.
!Note: The current Virtuoso Staging instance doesn't Sponge external information. This documentation is here in case we decide to do it in the future.
This is for retrieving external RDF data that can be reached from the loaded JSON-LD (ie. Wikidata RDF). After discussing with Ich, this might or might not be what we want.
(See more about sponging here)
In interactive SQL (ISQL), run: (Change the grab-depth and limit)
SPARQL
define input:grab-all "yes" define input:grab-depth 2 define input:grab-limit 100
SELECT *
FROM NAMED <urn:test>
WHERE { GRAPH ?g { ?s ?p ?o } };
Accounting for codes above:
Upon execution, one may find there appear New Named Graphs(presumed as NNG) in your local Virtuoso, which graphs are named according to instances from the
<urn:test>
graph. As long as an instance is an accessible URL(presumed as A), namely a visitable webpage, sponger can incorporate those URLs(presumed as B1,B2,...) that link A, and convert them into RDF in the NNG.
To focus on sponging wikidata fields:
SPARQL
define input:grab-all "yes"
define input:grab-depth 5
define input:grab-limit 20
SELECT ?s ?p ?o
FROM NAMED <urn:test>
WHERE {
GRAPH ?g {
?s ?p ?o .
FILTER(STRSTARTS(STR(?p), "http://www.wikidata.org/"))
}
};
If you have CheckpointAuditTrail
set to 1
in virtuoso.ini
, you should also configure Virtuoso to put all transaction files in their own directory, otherwise it will fill up the main directory.
To do this, first shut down Virtuoso by running shutdown;
in the isql
prompt.
Then, in the virtuoso.ini
file, change the TransactionFile
setting to reflect the new path, keeping the same filename. As an example, change ../database/virtuoso20250702133900.trx
to ../database/transaction-logs/virtuoso20250702133900.trx
.
Then, run the following commands to make the new folder and move all transaction files to it. The paths are written for the setup on the Virtuoso production server so change them if your paths are different.
sudo mkdir /srv/virtuoso/my_virtdb/transaction-logs
sudo mv /srv/virtuoso/my_virtdb/*.trx /srv/virtuoso/my_virtdb/transaction_logs/
The sudo mkdir
and sudo mv
are because the files and folders are all owned by the root
user.
Finally, restart Virtuoso with docker restart my_virtdb
.
First, you'll want to create the virtuoso-users
group:
sudo groupadd virtuoso-users
You'll then want to add users to the group. Run the following command for each user you want to add:
sudo usermod -aG virtuoso-users <USERNAME>
Then, make virtuoso-users
the group owner of the data folder and all its contents.
sudo chgrp -R virtuoso-users /srv/virtuoso/my_virtdb/data/
Next, set the setgid
bit on the data folder and all subfolders. This will ensure that all newly created files and folders will keep the virtuoso-users
group. The command will also give the group and owner permission to traverse all directories in the data folder.
find /srv/virtuoso/my_virtdb/data/ -type d -exec sudo chmod g+s,ug+x {} \;
Also give read/write permissions to the group and owner so that all virtuoso-users
users can edit the files.
sudo chmod -R g+rw /srv/virtuoso/my_virtdb/data/
Below is a complete list of modifications made to the default virtuoso.ini
file on the production server. Staging is using default settings as of 18 June 2025 (with the exception of the DirsAllowed
change). Referenced documentation was found on this page. However, it appears to be significantly out of date. The default virtuoso.ini
file is much smaller in the documentation than the default that was on production. Some of the parameters have been changed or removed and many of the default values are different.
Parameter | Default Value | New Value | Reason |
---|---|---|---|
DirsAllowed |
., ../vad, /usr/share/proj |
., ../vad, /usr/share/proj, /database, /database/data |
In order to use the bulk loader, you must enable access to the database directory. Resolves "access denied" issue when running ld_dir . See this Wiki page. |
NumberOfBuffers |
10000 |
400000 |
virtuoso.ini suggests the following: when running with large data sets, one should configure the Virtuoso process to use between 2/3 to 3/5 of free system memory . For the 6 GB we have available on production, this would be 510000 (linearly interpolating the suggested values for 4 GB and 8 GB in virtuoso.ini ). Resolves #383. Further reduced to 400000 to reduce memory usage. See #392. |
MaxDirtyBuffers |
6000 |
300000 |
virtuoso.ini suggests the following: when running with large data sets, one should configure the Virtuoso process to use between 2/3 to 3/5 of free system memory . For the 6 GB we have available on production, this would be 375000 (linearly interpolating the suggested values for 4 GB and 8 GB in virtuoso.ini ). Resolves #383. Further reduced to 300000 to reduce memory usage. See #392. |
FileExtend |
100 |
5000 |
Increased FileExtend to improve performance during database growth. This reduces the frequency of small I/O operations by allocating additional space in larger 40 MB chunks (8 KB per buffer), which is more efficient for large or growing RDF datasets. |
CheckpointAuditTrail |
0 |
1 |
Enabled CheckpointAuditTrail to ensure that each checkpoint generates a new transaction log file, preserving a complete history of database changes. This provides a reliable audit trail and improves recovery options in the event of system failure or data corruption. This may not be necessary (see #403). |
FreeTextBatchSize |
100000 |
10000000 |
FreeTextBatchSize controls how much text data (in bytes) is processed per indexing batch. Increased to allow larger chunks of text data to be indexed per batch during full-text indexing, reducing overhead and improving performance for large RDF loads and reindexing operations. Increase further to speed up indexing at the cost of RAM. ChatGPT 4o recommended 30 MB instead of 10 MB, but I know we're generally tight on RAM and don't care too much about speed, so I lowered this number. |
AdjustVectorSize |
0 |
1 |
Enabled AdjustVectorSize to allow Virtuoso to dynamically increase the number of rows processed per batch during query execution. This improves performance in large or fragmented queries by reducing random I/O and increasing cache and disk locality, even when using a single disk. It allows the engine to respond adaptively to the data access pattern without wasting resources on small queries. |
HTTPLogFile |
logs/http.log (commented out) |
logs/http.log |
This enables logging to logs/http.log . This is the default path, although it is commented out by default. |
HTTPLogFormat |
N/A (new variable) | %h %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" "%{NetId}e" |
This logging format is the default suggested in this page. |
SQL_PREFETCH_ROWS |
100 |
1000 |
The maximum number of rows the server will send in response to a single fetch call. For example, if the query returns 5000 rows, the client will now send 5 requests of 1000 rows instead of 50 requests of 100. This should be adjusted once we know how large the average query is. |
SQL_PREFETCH_BYTES |
16000 |
131072 (~128 KB) |
Same as SQL_PREFETCH_ROWS above but for bytes. |
MaxQueryExecutionTime |
60 (seconds) |
900 |
Maximum execution time for one query. Increased to 15 minutes since queries may be large. |
MaxMemInUse |
0 (Unlimited) |
1000000000 |
1 GB is an arbitrarily large but bounded number that caps the size of result structures (e.g. intermediate hash tables or construct dictionaries). MaxQueryExecutionTime would likely kick in first but this should be bounded just in case. |
Below is a list of parameters that were not modified from the default, but could be considered in the future.
Parameter | Default Value | Reason |
---|---|---|
MaxClientConnections |
10 |
A maximum of 10 users can connect through SPARQL, HTTP, or SQL at once. |
ServerThreads |
10 |
Same as MaxClientConnections above. |
O_DIRECT |
0 |
Potential performance improvements. See #388. |
IndexTreeMaps |
64 |
Potential performance improvements. See #389. |
ResultSetMaxRows |
10000 |
Cuts off results at this value. Increase to allow users to make very large queries. |
MaxConstructTriples |
10000 |
Similar as above, restricts the maximum size of a CONSTRUCT result. |
MaxQueryCostEstimationTime |
400 (seconds) |
This caps how long Virtuoso will spend estimating a query’s cost before execution. Reduce to reject costly queries faster. Raise if we have have very complex federated/inference rules that legitimately take longer to plan. |
SQL_QUERY_TIMEOUT |
0 (unlimited) |
Same as MaxQueryExecutionTime above (adjusted from 60 to 900 seconds) but client side. Leaving it as unlimited because MaxQueryExecutionTime should kick in first. Don't feel strongly either way about this one. |