Virtuoso Setup Guide - DDMAL/linkedmusic-datalake GitHub Wiki

Virtuoso Setup Guide

This guide details how to get Virtuoso up and running on the server or on a local machine.

Local Setup

When setting up locally, it is not worth it to go through all the configuration/setup steps. Instead follow this shorter list:

  1. Move to your home folder (cd ~) and follow the set up docker step, you can keep the dba password to mysecret since you're the only one with access to the machine.

  2. I would recommend only changing the following Virtuoso settings

All settings are located in the virtuoso.ini file, in the my_virtdb folder. You can edit it via the command line (with vim) or with any text editor like VSCode.

  • Add /database and /database/data to the DirsAllowed setting
  • Set both buffer settings to ~85% of the recommended value for the amount of RAM that docker has (8GB by default on Mac), and ensure you comment out the default settings for those (with a ;)
  • If you're getting issues with the estimated time for queries, comment out the MaxQueryCostEstimationTime setting since you don't care about long queries since it's not a production setting

Once you're made changes to the file, restart Virtuoso (docker restart my_virtdb) so that they take effect.

  1. Log into the isql shell (docker exec -it my_virtdb isql -U dba -P mysecret) and run the following in the iSQL shell:
    DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('SPARQL', 7);
    DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('nobody', 7);

    -- For federated SPARQL query search, see https://community.openlinksw.com/t/sparql-federated-query/4162/4
    grant execute on "DB.DBA.SPARQL_SINV_IMP" to "SPARQL";
    grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL";

    grant SPARQL_SELECT to "SPARQL";
    grant SPARQL_SELECT_FED to "SPARQL";
  1. Follow the adding data step to import the data.

To download the ttl file(s) and the global.graph file from the Virtuoso server, run the following command from the data folder on your local machine:

rsync -rtvz -e ssh ddmal.prod_virtuoso:/srv/virtuoso/my_virtdb/data/<database_name> .

Do not put a trailing slash after the name of the folder. As an example, for diamm:

rsync -rtvz -e ssh ddmal.prod_virtuoso:/srv/virtuoso/my_virtdb/data/diamm .
  1. For prefixes, I'd recommend only setting up wd:, wdt:, wikibase: and the main database ones (so only diamm: for diamm for example). See this section for instructions on how to do that

Forwarding a local Virtuoso instance to the staging server

If you have a local instance of Virtuoso that you want to forward to the staging server so that others can access it, follow these steps:

  1. Shut down the docker container on the staging server (docker stop my_virtdb)

  2. Ensure that ssh settings are correct on the server.

On the server, modify the /etc/ssh/sshd_config file to ensure that the following lines are present. You will need to use sudo vim (or your text editor of choice) to modify the file.

GatewayPorts yes
AllowTcpForwarding yes

If you modified the file, run sudo systemctl reload sshd to update the configuration.

  1. Forward the server's port 8890 to your local machine's port 8890 (Virtuoso's HTTP server)

On your machine, run the following command to forward the server's port 8890 to your machine's port 8890. This is what allows others to access the Virtuoso server.

ssh -N -f -R 0.0.0.0:8890:localhost:8890 ddmal.staging_virtuoso

This command will run the reverse SSH tunnel in the background. To stop the tunnel, first run ps aux | grep 'ssh -N' to find the PID of the tunnel. It will be the value in the second column. Once you have the PID for the SSH process, run kill <PID> to stop it.

Current Staging Instance

The staging server (https://virtuoso.staging.simssa.ca) was setup according to the instructions bellow. For information on the server itself, see the DDMAL internal Wiki.

Set up docker

(official Virtuoso Docker setup guide here)

  1. Pull the docker image (line 1) and check the image version (optional, line 2).

    sudo docker pull openlink/virtuoso-opensource-7
    sudo docker run openlink/virtuoso-opensource-7 version
  2. Start a docker container.

    sudo mkdir my_virtdb
    cd my_virtdb
    sudo docker run \
        --name my_virtdb \
        --detach \
        --env DBA_PASSWORD=mysecret \
        --publish 1111:1111 \
        --publish  8890:8890 \
        --volume "$(pwd)":/database \
        openlink/virtuoso-opensource-7:latest

This creates a new Virtuoso database in the my_virtdb subdirectory and starts a Virtuoso instance with the HTTP server listening on port 8890 and the ISQL data server listening on port 1111.

Note that you should change the DBA_PASSWORD to the desired password.

Add packages to virtuoso

  1. Go to the local server http://localhost:8890/. Log into conductor using
username: dba 
password: mysecret
  1. Go to System Admin > Packages. Download conductor, fct, iSPARQL, rdf_mappers (download rdf_mappers [here](http://download3.openlinksw.com/uda/vad-vos-packages/7.2/rdf_mappers_dav.vad) and install from upload). You can find the rest of the packages here if not previously installed.

  2. Check if faceted search works here http://localhost:8890/fct/. Try SPARQL here http://localhost:8890/sparql/.

  3. Configure data and permissions.

    Open the ISQL CLI:

    -- Permission for Sponging (optional)
    -- see https://github.com/openlink/virtuoso-opensource/issues/1180
    
    DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('SPARQL', 7); 
    DB.DBA.RDF_DEFAULT_USER_PERMS_SET ('nobody', 7); 
    
    -- Post Installation Setup for Virtuoso Faceted Browser
    -- see: https://vos.openlinksw.com/owiki/wiki/VOS/VirtFacetBrowserInstallConfig#Post%20Installation
    RDF_OBJ_FT_RULE_ADD (null, null, 'All');
    VT_INC_INDEX_DB_DBA_RDF_OBJ ();
    urilbl_ac_init_db();
    s_rank();
    
    -- For federated SPARQL query search, see https://community.openlinksw.com/t/sparql-federated-query/4162/4
    grant execute on "DB.DBA.SPARQL_SINV_IMP" to "SPARQL";
    grant select on "DB.DBA.SPARQL_SINV_2" to "SPARQL";
    
    grant SPARQL_SELECT to "SPARQL";
    grant SPARQL_SELECT_FED to "SPARQL";

Note: Make sure to rerun these lines after loading a new JSON-LD (for text indexing and entity label table)

    VT_INC_INDEX_DB_DBA_RDF_OBJ ();
    urilbl_ac_init_db();

Add data to the local instance

This can be done before or after the configuration.

  1. Create a data folder

While in the my_virtdb directory, run the following command to create a directory in which you'll put the data:

mkdir data
  1. Import the data into Virtuoso

Follow the Importing and Updating Data on Virtuoso Guide to import data into Virtuoso's database.

Other configurations:

1. Add Namespace Prefixes to facilitate SPARQL queries

From Conductor, navigate to "Linked Data">"Namespaces" to view the list of configured prefixes and to add your own, adding for example these prefixes for Wikidata:

wd: http://www.wikidata.org/entity/
wdt: http://www.wikidata.org/prop/direct/

When adding prefixes on the webpage, do not include the : in the prefix name, and do not include the angle brackets (<>) in the URI.

Once you have added all prefixes, run a checkpoint. To do this, log into the isql prompt and run the checkpoint; command. Without the checkpoint command, the changes might not be properly saved next time Virtuoso restarts.

2. Sponger

Optional: Sponge urls within the json-ld

!Note: The current Virtuoso Staging instance doesn't Sponge external information. This documentation is here in case we decide to do it in the future.

This is for retrieving external RDF data that can be reached from the loaded JSON-LD (ie. Wikidata RDF). After discussing with Ich, this might or might not be what we want.

(See more about sponging here)

In interactive SQL (ISQL), run: (Change the grab-depth and limit)

SPARQL
define input:grab-all "yes" define input:grab-depth 2 define input:grab-limit 100
SELECT * 
FROM NAMED <urn:test>
WHERE { GRAPH ?g { ?s ?p ?o } };

Accounting for codes above:

Upon execution, one may find there appear New Named Graphs(presumed as NNG) in your local Virtuoso, which graphs are named according to instances from the <urn:test> graph. As long as an instance is an accessible URL(presumed as A), namely a visitable webpage, sponger can incorporate those URLs(presumed as B1,B2,...) that link A, and convert them into RDF in the NNG.

To focus on sponging wikidata fields:

SPARQL
define input:grab-all "yes"
define input:grab-depth 5
define input:grab-limit 20

SELECT ?s ?p ?o
FROM NAMED <urn:test>
WHERE {
  GRAPH ?g {
    ?s ?p ?o .
    FILTER(STRSTARTS(STR(?p), "http://www.wikidata.org/"))
  }
};

3. Move transaction log files to their own directory

If you have CheckpointAuditTrail set to 1 in virtuoso.ini, you should also configure Virtuoso to put all transaction files in their own directory, otherwise it will fill up the main directory.

To do this, first shut down Virtuoso by running shutdown; in the isql prompt.

Then, in the virtuoso.ini file, change the TransactionFile setting to reflect the new path, keeping the same filename. As an example, change ../database/virtuoso20250702133900.trx to ../database/transaction-logs/virtuoso20250702133900.trx.

Then, run the following commands to make the new folder and move all transaction files to it. The paths are written for the setup on the Virtuoso production server so change them if your paths are different.

sudo mkdir /srv/virtuoso/my_virtdb/transaction-logs
sudo mv /srv/virtuoso/my_virtdb/*.trx /srv/virtuoso/my_virtdb/transaction_logs/

The sudo mkdir and sudo mv are because the files and folders are all owned by the root user.

Finally, restart Virtuoso with docker restart my_virtdb.

4. Create a virtuoso-users group so that users can add/remove data without sudo

First, you'll want to create the virtuoso-users group:

sudo groupadd virtuoso-users

You'll then want to add users to the group. Run the following command for each user you want to add:

sudo usermod -aG virtuoso-users <USERNAME>

Then, make virtuoso-users the group owner of the data folder and all its contents.

sudo chgrp -R virtuoso-users /srv/virtuoso/my_virtdb/data/

Next, set the setgid bit on the data folder and all subfolders. This will ensure that all newly created files and folders will keep the virtuoso-users group. The command will also give the group and owner permission to traverse all directories in the data folder.

find /srv/virtuoso/my_virtdb/data/ -type d -exec sudo chmod g+s,ug+x {} \;

Also give read/write permissions to the group and owner so that all virtuoso-users users can edit the files.

sudo chmod -R g+rw /srv/virtuoso/my_virtdb/data/

virtuoso.ini Configurations

Below is a complete list of modifications made to the default virtuoso.ini file on the production server. Staging is using default settings as of 18 June 2025 (with the exception of the DirsAllowed change). Referenced documentation was found on this page. However, it appears to be significantly out of date. The default virtuoso.ini file is much smaller in the documentation than the default that was on production. Some of the parameters have been changed or removed and many of the default values are different.

Parameter Default Value New Value Reason
DirsAllowed ., ../vad, /usr/share/proj ., ../vad, /usr/share/proj, /database, /database/data In order to use the bulk loader, you must enable access to the database directory. Resolves "access denied" issue when running ld_dir. See this Wiki page.
NumberOfBuffers 10000 400000 virtuoso.ini suggests the following: when running with large data sets, one should configure the Virtuoso process to use between 2/3 to 3/5 of free system memory. For the 6 GB we have available on production, this would be 510000 (linearly interpolating the suggested values for 4 GB and 8 GB in virtuoso.ini). Resolves #383. Further reduced to 400000 to reduce memory usage. See #392.
MaxDirtyBuffers 6000 300000 virtuoso.ini suggests the following: when running with large data sets, one should configure the Virtuoso process to use between 2/3 to 3/5 of free system memory. For the 6 GB we have available on production, this would be 375000 (linearly interpolating the suggested values for 4 GB and 8 GB in virtuoso.ini). Resolves #383. Further reduced to 300000 to reduce memory usage. See #392.
FileExtend 100 5000 Increased FileExtend to improve performance during database growth. This reduces the frequency of small I/O operations by allocating additional space in larger 40 MB chunks (8 KB per buffer), which is more efficient for large or growing RDF datasets.
CheckpointAuditTrail 0 1 Enabled CheckpointAuditTrail to ensure that each checkpoint generates a new transaction log file, preserving a complete history of database changes. This provides a reliable audit trail and improves recovery options in the event of system failure or data corruption. This may not be necessary (see #403).
FreeTextBatchSize 100000 10000000 FreeTextBatchSize controls how much text data (in bytes) is processed per indexing batch. Increased to allow larger chunks of text data to be indexed per batch during full-text indexing, reducing overhead and improving performance for large RDF loads and reindexing operations. Increase further to speed up indexing at the cost of RAM. ChatGPT 4o recommended 30 MB instead of 10 MB, but I know we're generally tight on RAM and don't care too much about speed, so I lowered this number.
AdjustVectorSize 0 1 Enabled AdjustVectorSize to allow Virtuoso to dynamically increase the number of rows processed per batch during query execution. This improves performance in large or fragmented queries by reducing random I/O and increasing cache and disk locality, even when using a single disk. It allows the engine to respond adaptively to the data access pattern without wasting resources on small queries.
HTTPLogFile logs/http.log (commented out) logs/http.log This enables logging to logs/http.log. This is the default path, although it is commented out by default.
HTTPLogFormat N/A (new variable) %h %u %t "%r" %s %b "%{Referer}i" "%{User-agent}i" "%{NetId}e" This logging format is the default suggested in this page.
SQL_PREFETCH_ROWS 100 1000 The maximum number of rows the server will send in response to a single fetch call. For example, if the query returns 5000 rows, the client will now send 5 requests of 1000 rows instead of 50 requests of 100. This should be adjusted once we know how large the average query is.
SQL_PREFETCH_BYTES 16000 131072 (~128 KB) Same as SQL_PREFETCH_ROWS above but for bytes.
MaxQueryExecutionTime 60 (seconds) 900 Maximum execution time for one query. Increased to 15 minutes since queries may be large.
MaxMemInUse 0 (Unlimited) 1000000000 1 GB is an arbitrarily large but bounded number that caps the size of result structures (e.g. intermediate hash tables or construct dictionaries). MaxQueryExecutionTime would likely kick in first but this should be bounded just in case.

Below is a list of parameters that were not modified from the default, but could be considered in the future.

Parameter Default Value Reason
MaxClientConnections 10 A maximum of 10 users can connect through SPARQL, HTTP, or SQL at once.
ServerThreads 10 Same as MaxClientConnections above.
O_DIRECT 0 Potential performance improvements. See #388.
IndexTreeMaps 64 Potential performance improvements. See #389.
ResultSetMaxRows 10000 Cuts off results at this value. Increase to allow users to make very large queries.
MaxConstructTriples 10000 Similar as above, restricts the maximum size of a CONSTRUCT result.
MaxQueryCostEstimationTime 400 (seconds) This caps how long Virtuoso will spend estimating a query’s cost before execution. Reduce to reject costly queries faster. Raise if we have have very complex federated/inference rules that legitimately take longer to plan.
SQL_QUERY_TIMEOUT 0 (unlimited) Same as MaxQueryExecutionTime above (adjusted from 60 to 900 seconds) but client side. Leaving it as unlimited because MaxQueryExecutionTime should kick in first. Don't feel strongly either way about this one.
⚠️ **GitHub.com Fallback** ⚠️