Compiling and configuring Virtuoso - dkmfbk/knowledgestore GitHub Wiki
This page briefly summarizes some notes for compiling, configuring and using the Virtuoso triple store with the KnowledgeStore. Notes are based on Virtuoso version 7.2 (but also apply to version 7.1).
Compiling Virtuoso
Download Virtuoso sources from GitHub: https://github.com/openlink/virtuoso-opensource (clone or download zip), then enter the parent folder and execute:
export CFLAGS="-O2 -m64"
./autogen.sh
./configure --with-readline \
--prefix=PATH_TO_INSTALL_DIR \
--with-jdk4_1=PATH_TO_JDK7_DIR \
--program-transform-name="s/isql/isql-vt/" \
make
make check # still broken
make install
Notes:
- the
--prefix
configure flag is necessary to install Virtuoso in a specific directory (e.g.,/opt/virtuoso-7.2
); without it, Virtuoso binaries will be put under/usr/bin
and so on; based on this setting, it may be necessary to executemake install
as root; - the
--program-transform-name
flag is necessary to avoid name clashes between theisql
tool by Virtuoso and the one that may be already installed on the machine - the
--with-readline
flag is necessary to compile Virtuoso with readline support (note: the command line client is almost inusable without readline!) - expect a lot of warnings to be logged during make - it's ok;
- the
make check
command is optional; it tests the compiled binaries against a test suite; a number of tests fail but it's ok.
Configuring Virtuoso
See also http://docs.openlinksw.com/virtuoso/databaseadmsrv.html
Basic configuration (parameters must be set correctly for Virtuoso to start at all):
DatabaseFile
,ErrorLogFile
,LockFile
,TransactionFile
,xa_persistent_file
in section[Database]
;DatabaseFile
andTransactionFile
in section[TempDatabase]
;ServerPort
,DirsAllowed
,VADInstallDir
in section[Parameters]
;ServerPort
,ServerRoot
,HTTPLogFile
in section[HTTPServer]
;LoadPath
,LoadNNN
in section[Plugins]
(may disable some LoadNNN lines if corresponding plugins are not used)
Setting Memory
To set the right amount of memory, run "status();
" from isql-vt
, and compute pages
-free
.
Set the result as the NumberOfBuffers
(slightly increased). Set MaxDirtyBuffers
as 3/4 of NumberOfBuffers
.
SPARQL configuration (section SPARQL
):
MaxQueryCostEstimationTime
(default4000
seconds). Better not to set this parameter, as estimated execution times may be wrong and valid queries may be rejected for that reason.MaxQueryExecutionTime
. Set to large value in case analytical queries need to be run. The value set here is an hard constraint that prevails on any timeout passed by the client (including the KS).ResultSetMaxRows
. Set to large value in case dump queries need to be run (keep in mind hard 1M constraint of SPARQL HTTP endpoint).DefaultQuery
. Can be changed to something more informative likeSELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o }
.
Performance optimization (section [Parameters]
):
VectorSize
(default1000
). This roughly controls how many rows are processed together by the query processor. This parameter greatly affects performances, but on a per-query basis (i.e., some queries prefer a small value, other a large value). The default1000
is a good tradeoff that may be slightly increased but only based on experiments.MaxQueryMem
(default2G
) &HashJoinSpace
. The first parameter controls the amount of memory that is constantly allocated for the query processor (but more memory could be used and then released if necessary, which causes some overhead). Increasing it leads to a small increase in performances, especially for slow queries.HashJoinSpace
controls the fraction ofMaxQueryMem
that can be used for hash joins. It seems advisable to set it equal toMaxQueryMem
.AdjustVectorSize
(default0
) &MaxVectorSize
. If set to1
,AdjustVectorsize
allows to increaseVectorSize
adaptively for queries that require an higher value; in that case,MaxVectorSize
is the maximum value thatVectorSize
can be set to (1000000
is the suggested value).AdjustVectorSize = 1
should be beneficial in theory (as suggested in Virtuoso docs), but we noticed a relevant decrease in performances by enabling it, so it seems better to leave it disabled as in the default Virtuoso configuration.ThreadsPerQuery
(default4
) andAsyncQueueMaxThreads
(default10
). They control the max additional number of threads that can be allocated to a query and their sum across all queries (i.e., the pool size). They have little impact on fast queries, but it seems better to set both of them to the number of CPU threads as this may improve the execution of slow, complex queries.ServerThreads
(default20
). It must be set at least to the number of CPU threads, better if something more to have some margin (don't know whether they are used exclusively for client queries or also for internal tasks; to stay on the safe side, we set this parameter to twice the number of CPU threads)NumberOfBuffers
&MaxDirtyBuffers
. The first is the number of 8 KB pages for storing (caching) the DB in memory. Memory-permitting, it should be set to a number larger than the number of pages used by the DB (useisql-vt
andstatus()
command to compute it). The second parameter is relevant only when data is modified and its suggested value is 3/4 of the first parameter value.DefaultIsolation
. If the database is read-only, it seems natural to set it to1
(=READ UNCOMMITTED
, i.e. no transactional guarantee) to avoid any synchronization overhead.MaxMemPoolSize
(default100000000
). This is the max memory used by the query planner. A larger value (200000000
) was found in Internet.O_DIRECT
(default0
). It controls whether OS file buffering is used or skipped when accessing the DB. The default value0
(use buffering) seems to be faster.
Configuring the KnowledgeStore for interfacing with Virtuoso
A template for the configuration of the TripleStore internal component is the following:
<obj:tripleStore>
a <java:eu.fbk.knowledgestore.triplestore.SynchronizedTripleStore> ;
:synchronizerSpec "NUM_CPU_THREADS_HERE:0" ;
:delegate [
a <java:eu.fbk.knowledgestore.triplestore.LoggingTripleStore> ;
:delegate [
a <java:eu.fbk.knowledgestore.triplestore.virtuoso.VirtuosoJdbcTripleStore> ;
:host "VIRTUOSO_HOST_HERE" ;
:port "VIRTUOSO_PORT_HERE" ;
:username "VIRTUOSO_USERNAME" ;
:password "VIRTUOSO_PASSWORD" ;
:fetchSize 200 ;
]
] .
The important point is to use the VirtuosoJdbcTripleStore
driver that offers better performances.
It is also important to fine tune the KnowledgeStore thread pool, which should be larger than # CPU threads + 2 * (# HTTP server acceptors + # HTTP server selectors)
. Note that # acceptors
and # selectors
can be safely set to 1
unless a very large number of concurrent client connections is expected.
The relevant configuration fragment is:
<obj:launcher>
:threadCount NUM_OF_THREADS_IN_THE_POOL_HERE ;
...
<obj:httpServer>
...
:acceptors NUM_HTTP_ACCEPTOR_THREADS_HERE ;
:selectors NUM_HTTP_SELECTOR_THREADS_HERE ;
...