InstallFusekiJenaText - NatLibFi/Skosmos GitHub Wiki
Installing Fuseki with the jena-text extension
Jena Fuseki is a SPARQL server and triple store which is the recommended backend for Skosmos. The jena-text extension can be used for faster text search. Fuseki 1.0.1 or later is recommended, because it includes graph-specific indexing. Fuseki 1.3.0+ or 2.3.0+ is required for Skosmos 1.4 and above. NOTE: Fuseki 1.3.1 and 2.3.1 have a bug which affects Skosmos so they are not recommended.
You will need:
- a recent Fuseki distribution (look for the newest
jena-fuseki-*-distribution.tar.gz
): http://www.apache.org/dist/jena/binaries/
- Unpack the Fuseki distribution:
tar xzf jena-fuseki-*-distribution.tar.gz
cd jena-fuseki-*-SNAPSHOT
If all went well, you should be able to test Fuseki by running ./fuseki-server --mem /ds
To use the index, you will need to run Fuseki with a configuration file, such as the one below. The example is based on the jena-text example configuration but has the following edits:
- add a graph index
- index properties skos:prefLabel, skos:altLabel, skos:hiddenLabel instead of rdfs:label
- set TDB location to /tmp/tdb (change this to where you want to keep the TDB store)
- set Lucene index location to /tmp/lucene (change this to where you want to keep the Lucene index)
## Example of a TDB dataset and text index published using Fuseki
@prefix : <#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text: <http://jena.apache.org/text#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
[] rdf:type fuseki:Server ;
fuseki:services (
<#service_text_tdb>
) .
# TDB
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
# Text
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
text:TextDataset rdfs:subClassOf ja:RDFDataset .
#text:TextIndexSolr rdfs:subClassOf text:TextIndex .
text:TextIndexLucene rdfs:subClassOf text:TextIndex .
## ---------------------------------------------------------------
<#service_text_tdb> rdf:type fuseki:Service ;
rdfs:label "TDB/text service" ;
fuseki:name "ds" ;
fuseki:serviceQuery "query" ;
fuseki:serviceQuery "sparql" ;
fuseki:serviceUpdate "update" ;
fuseki:serviceUpload "upload" ;
fuseki:serviceReadGraphStore "get" ;
fuseki:serviceReadWriteGraphStore "data" ;
fuseki:dataset <#text_dataset> ;
.
<#text_dataset> rdf:type text:TextDataset ;
text:dataset <#dataset> ;
##text:index <#indexSolr> ;
text:index <#indexLucene> ;
.
<#dataset> rdf:type tdb:DatasetTDB ;
tdb:location "/tmp/tdb" ;
tdb:unionDefaultGraph true ;
.
<#indexSolr> a text:TextIndexSolr ;
#text:server <http://localhost:8983/solr/COLLECTION> ;
text:server <embedded:SolrARQ> ;
text:entityMap <#entMap> ;
.
<#indexLucene> a text:TextIndexLucene ;
text:directory <file:/tmp/lucene> ;
##text:directory "mem" ;
text:entityMap <#entMap> ;
text:storeValues true ; ## required for Skosmos 1.4
.
# Text index configuration for Skosmos 1.4 and above (requires Fuseki 1.3.0+ or 2.3.0+)
<#entMap> a text:EntityMap ;
text:entityField "uri" ;
text:graphField "graph" ; ## enable graph-specific indexing
text:defaultField "pref" ; ## Must be defined in the text:map
text:uidField "uid" ; ## recommended for Skosmos 1.4+
text:langField "lang" ; ## required for Skosmos 1.4
text:map (
# skos:prefLabel
[ text:field "pref" ;
text:predicate skos:prefLabel ;
text:analyzer [ a text:LowerCaseKeywordAnalyzer ]
]
# skos:altLabel
[ text:field "alt" ;
text:predicate skos:altLabel ;
text:analyzer [ a text:LowerCaseKeywordAnalyzer ]
]
# skos:hiddenLabel
[ text:field "hidden" ;
text:predicate skos:hiddenLabel ;
text:analyzer [ a text:LowerCaseKeywordAnalyzer ]
]
# skos:notation
[ text:field "notation" ;
text:predicate skos:notation ;
text:analyzer [ a text:LowerCaseKeywordAnalyzer ]
]
) .
Save this as jena-text-config.ttl and now you can run Fuseki with ./fuseki-server --config jena-text-config.ttl
In order to get fuseki to use this config file by default, add the following line to /etc/environment: FUSEKI_CONF="/actual/full/path/goes/here/jena-text-config.ttl"
The above configuration is the suggested starting point but other configurations can be used as well, for example when other applications are also using the jena-text index. Skosmos (version 1.4+) has the following requirements for the jena-text configuration:
- The properties
skos:prefLabel
,skos:altLabel
andskos:hiddenLabel
MUST be indexed. They SHOULD be configured with different field names. Other properties SHOULD NOT be configured to share the same field name with these properties. - Alternative analyzer configurations can be used instead of LowerCaseKeywordAnalyzer, but the analyzer MUST be case-insensitive.
- The analyzer configuration SHOULD be the same for all SKOS properties (
prefLabel
,altLabel
andhiddenLabel
). -
text:storeValues
MUST be true. -
text:langField
MUST be set to a unique field name. -
text:graphField
SHOULD be set to a unique field name. -
text:uidField
SHOULD be set to a unique field name. - The
text:defaultField
setting is not used by Skosmos but jena-text itself requires that it MUST be set to one of the configured field names.
In the above requirements, "MUST", "SHOULD" and "SHOULD NOT" are to be interpreted according to RFC 2119. In practice, the performance of Skosmos may not be optimal if the "SHOULD" and "SHOULD NOT" requirements are not followed.
See FusekiTuning for tips on tuning Fuseki for production use.