Installing - ge-semtk/semtk GitHub Wiki
These are instructions to get the SemTK services and UI up and running. It presumes Linux, but can run on Windows as well. An easy way to do this in windows is by running the commands shown on this page inside a bash shell such as git bash, which is included in the windows git distribution.
Prerequisite: Install a triple store
Install a triple store such as Virtuoso or Fuseki.
Install Fuseki
Fuseki is the recommended triplestore for SemTk. The latest distribution is at https://jena.apache.org/download/index.cgi.
Startup instructions are at https://jena.apache.org/documentation/fuseki2/fuseki-quick-start.html
Create a dataset (e.g. named "SemTK") that persists across Fuseki restarts.
Or, Install Virtuoso
Virtuoso is available through OpenLink Software. Installation instructions are at http://virtuoso.openlinksw.com/howto/
Prerequisite: Install a web server
Install a web server such as Apache Tomcat or Apache HTTP Server (httpd)
Create a directory (referred to below as WEBAPPS) within your web server for the SemTK web app.
- Example for Tomcat:
/no_backup/tomcat/apache-tomcat-8.0.18/webapps/semtk
- Example for httpd:
/var/www/html
Install SemTK from source code or binary distribution
Create a directory (referred to below as SEMTK) for SemTK.
If updating/replacing an existing SemTK installation, be sure to save the existing ENV_OVERRIDE file.
To install SemTK from source code
If you need to install GIT, this might work for you:
$ sudo yum install git
$ git config user.name “Your Name”
$ git config user.email “[email protected]”`
If you need to install Maven, this might work for you:
$ sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
$ sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
$ sudo yum install -y apache-maven
Clone and build SemTK:
$ cd SEMTK
$ git clone https://github.com/ge-semtk/semtk.git
$ mvn clean install -DskipTests
To install SemTK from binary distribution
- Download the binary distribution file (e.g. semtk-opensource-*-dist.tar.gz) from GitHub Releases to the SEMTK directory
- Unzip/untar the binary distribution file, which will create SEMTK/semtk-opensource
Create a local config (ENV_OVERRIDE) file
A default configuration file (.env
) can be found in the top-level semtk-opensource directory. Typically, some of the settings in this file will need to be overridden for the local environment. This should be done by creating a file called ENV_OVERRIDE
(do not change the .env
file). Some common ENV_OVERRIDE entries are as follows:
To start only a subset of the SemTK services (this example represents the 10 core SemTK services):
export ENABLED_SERVICES="nodeGroupExecutionService nodeGroupService nodeGroupStoreService ontologyInfoService sparqlExtDispatchService sparqlGraphIngestionService sparqlGraphResultsService sparqlGraphStatusService sparqlQueryService utilityService"
To change the temporary results file directory:
export resultsFileLocation=/directory12345/semtk-results
If you are using Fuseki: your ENV_OVERRIDE should contain these settings:
export SERVICES_DATASET_SERVER_URL=http://localhost:3030/SemTK
export SERVICES_DATASET_ENDPOINT_TYPE=fuseki
Note: the ENV_OVERRIDE
file will not be changed if the SemTK code is updated from GIT (e.g. with a git pull
)
Start the SemTK services:
$ ./startServices.sh
Install the SemTK UI (SPARQLgraph):
Install the SemTK UI to your web server with the following command, where WEBAPPS is the web server directory described above:
$ ./updateWebapps.sh WEBAPPS
Test that the UI is working by hitting my.machine.com/sparqlGraph/index.html
Optionally try the "Hello World" demo.
Working with a reverse proxy
If your web machine can only be reached on ports like 80, 8080, 443 then you’ll need to use a reverse proxy.
There are many ways to do this, but here are some example lines for a reverse proxy .conf file (e.g. /etc/httpd/conf.d/default-site.conf)
ProxyPass /sparqlquery http://127.0.0.1:12050/
ProxyPassReverse /sparqlquery http://127.0.0.1:12050/
ProxyPass /ingestion http://127.0.0.1:12091/
ProxyPassReverse /ingestion http://127.0.0.1:12091/
In this case, the services are running on the same machine as the web server. If they a running somewhere else, use that url or IP instead of 127.0.0.1. Your configuration file will already have a line for:
ProxyPass / http://127.0.0.1:8080/
(but it might not direct to port 8080). In any event, make sure the lines are inserted into the reverse proxy config file before this default line.
When using a reverse proxy, the urls in the “Configuration” step above would change to use the new urls instead of the port numbers:
url : "http://my.machine.ge.com/ingestion/ingestion/",
url : "http://my.machine.ge.com/sparqlquery/sparqlQueryService/",
Installing on AWS
Set up an Apache web server (httpd) aws docs
- start http: sudo service httpd start
- find root directory for server: grep DocumentRoot /etc/httpd/conf/httpd.conf. We'll refer to this directory (e.g. /var/www/html) as WEBAPPS
- Set up /etc/httpd/conf.d/default-site.conf as outlined above in the "proxy" section.
Download the binary distribution file, move it to the AWS EC2 instance, and unzip it (all as described above).
The ENV_OVERRIDE file should look something like this:
# I needed to copy this whole folder to the node
export storeTemplateLocation=/run/semtk/semtk-opensource/sparqlGraphLibrary/src/main/resources/nodegroups/store.json
# TODO: I needed to create this folder
export resultsFileLocation=/tmp/DISPATCH_RESULTS
# TODO: on this host I can't find a name that works
export HOST_IP=10.200.100.200
export WEB_INGESTION_HOST=${HOST_IP}
export WEB_SPARQL_QUERY_HOST=${HOST_IP}
export WEB_STATUS_HOST=${HOST_IP}
export WEB_RESULTS_HOST=${HOST_IP}
export WEB_DISPATCH_HOST=${HOST_IP}
export WEB_HIVE_HOST=${HOST_IP}
export WEB_NODEGROUPSTORE_HOST=${HOST_IP}
export WEB_ONTOLOGYINFO_HOST=${HOST_IP}
export WEB_NODEGROUPEXECUTION_HOST=${HOST_IP}
export WEB_NODEGROUP_HOST=${HOST_IP}
# set the ports to use a proxy
export WEB_NODEGROUP_PORT=80/nodegroup
export WEB_INGESTION_PORT=80/ingestion
export WEB_SPARQL_QUERY_PORT=80/sparqlquery
export WEB_STATUS_PORT=80/status
export WEB_RESULTS_PORT=80/results
export WEB_HIVE_PORT=80/hive
export WEB_DISPATCH_PORT=80/dispatch
export WEB_NODEGROUPSTORE_PORT=80/nodegroupstore
export WEB_ONTOLOGYINFO_PORT=80/ontologyinfo
export WEB_NODEGROUPEXECUTION_PORT=80/nodegroupexec
# this is the only way to load a GE-specific variable that is needed in semtk-oss
export DISPATCHER_CLASS_NAME=com.ge.research.semtk.sparqlX.dispatch.EdcDispatcher
# set this to FQDN in order to get maximum speed to DGX within GE network
export resultsBaseURL=http://10.200.100.200/${PORT_SPARQLGRAPH_RESULTS_SERVICE}
TODO: I needed to edit semtk-opensource .fun because the host function didn't work
function sethostname
{
export HOST_NAME=$(hostname)
}
Install the SemTK UI, as described above.
Start the SemTK Services, as described above.
SemTK Docker image
Here are instructions to build and run a SemTK Docker image: https://github.com/ge-semtk/semtk/blob/master/deploy/README.md
Google Analytics
To optionally attach Google Analytics to your SemTK UI (SPARQLgraph) installation:
- Create a Google Analytics account
- Get your Tracking ID from Google
- Download googleAnalyticsLogger.js
- In googleAnalyticsLogger.js, replace YOUR_GOOGLE_ANALYTICS_TRACKING_ID with your tracking id from Google
- copy the file to your webapps, overwriting sparqlForm/main-oss/KDLEasyLoggerConfig.js
- copy the file to your webapps, overwriting sparqlGraph/main-oss/KDLEasyLoggerConfigOss.js
Reload your web page and Google Analytics will begin flowing.