Server Install - janpolowinski/dswarm-documentation GitHub Wiki
These are the installation instructions for a server and will add some steps that aim at a production environment, which are not described in the [Developer-Install](developer install).
Initial Installation
premise:
- currently, we've deployed the application only on Ubuntu Linux distributions
- empty trusty (14.04)
- three additional partitions (for easier increasing the sizes of the partitions separately): -- /data/log, 1-5GB depending on usage -- /data/mysql, up to 5GB -- /data/neo4j, up to 20GB
- let the $HOME of the less privileged user be '/home/user'
- all commands boxes start as the less privileged user and in their $HOME.
- Requiring root is explicitly marked (with
surather thansudo ...; note that using sudo before each line is not sufficient to yield the same results as with su. Using su may require you to set a password for root if not already done!)
Note: some commands require user input, this is no unattended installation
This step requires root level access
1. install system packages required for building the software
su
apt-get install --no-install-recommends --yes git-core maven nodejs npm build-essential
These steps require less privileged access
2. clone repositories (not as root!)
lookout for the correct path (/home/user)
cd /home/user
git clone --depth 1 --branch builds/unstable https://github.com/dswarm/dswarm.git
git clone --depth 1 --branch master https://github.com/dswarm/dswarm-graph-neo4j.git
git clone --depth 1 --branch builds/unstable https://github.com/dswarm/dswarm-backoffice-web.git
These steps require root level access
3. install Java and Tomcat for the backend
- D:SWARM requires Java 8, which is no longer available in the default package sources. Follow these steps
su
add-apt-repository ppa:webupd8team/java
apt-get update
apt-get install oracle-java8-installer oracle-java8-set-default
You can verify your java version with
java -version 2>&1 | grep -q "1.8" && echo "OK, Java 8 is available" || echo "Uh oh, Java 8 is not available"
Earlier versions of Tomcat (< 7.0.30) do not run with Java 8 albeit being advertised to do so (related to this bug). Ubuntu 12.04 Precise includes Tomcat in version 7.0.26, which therefore must be updated. If you run precise, execute these steps:
wget https://launchpad.net/ubuntu/+archive/primary/+files/libservlet3.0-java_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/libtomcat7-java_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7-admin_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7-common_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7_7.0.52-1ubuntu0.1_all.deb
su
dpkg -i libservlet3.0-java_7.0.52-1ubuntu0.1_all.deb
dpkg -i libtomcat7-java_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7-common_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7-admin_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7_7.0.52-1ubuntu0.1_all.deb
If you run a more recent version, just install tomcat from the official sources:
su
apt-get install tomcat7
4. install system packages required for running the software
su
apt-get install --no-install-recommends --yes mysql-server nginx curl
5. install Data Hub (Neo4j)
currently, we rely on Neo4j version 2.2.2
su
wget -O - http://debian.neo4j.org/neotechnology.gpg.key | apt-key add -
echo 'deb http://debian.neo4j.org/repo stable/' > /etc/apt/sources.list.d/neo4j.list
apt-get update
apt-get install --no-install-recommends --yes neo4j=2.2.2
You can open the Neo4j Browser at http://localhost:7474/browser/ to check that the correct version has been installed.
Make sure Neo4j does not get updated when updating packages. You can use apt-pinning to do so. As root, create a file
su
touch /etc/apt/preferences.d/neo4j.pref
and add the following lines to this file.
Package: neo4j
Pin: version 2.2.2
Pin-Priority: 1000
6. make sure, permissions are correctly
su
chown -R tomcat7:tomcat7 /data/log
chown -R mysql:mysql /data/mysql
chown -R neo4j:adm /data/neo4j
7. install build environment for frontend
su
ln -s /usr/bin/nodejs /usr/bin/node
npm install -g grunt-cli karma bower
8. setup Metadata Repository (MySQL)
Create a database and a user for d:swarm. To customize the settings, edit dswarm/persistence/src/main/resources/create_database.sql. Do not check in this file in case you modify it. Hint: remember settings for step 13 (configure d:swarm).
mysql -uroot -p < dswarm/persistence/src/main/resources/create_database.sql
Then, open /etc/mysql/my.cnf and add the following line to the section [mysqld] (around line 45)
wait_timeout = 1209600
in the same file, same sections, change datadir to /data/mysql (around line 40)
datadir = /data/mysql
add this directory to AppArmor
su
echo "alias /var/lib/mysql/ -> /data/mysql/," >> /etc/apparmor.d/tunables/alias
/etc/init.d/apparmor reload
and copy whole MySQL data directory to new location (after stopping the mysql service)
service mysql stop
cp -pr /var/lib/mysql/* /data/mysql/
9. setup Nginx
edit /etc/nginx/sites-available/default and add this just below the location / block
location /dmp {
client_max_body_size 100M;
proxy_pass http://127.0.0.1:8080$uri$is_args$args;
}
for very long running processes, add appropriate settings for timeouts such as the proxy_read_timeout, see http://nginx.org/en/docs/http/ngx_http_proxy_module.html. You may also need to remove the following lines for current versions of nginx:
listen [::]:80 default_server ipv6only=on;
root /usr/share/nginx/html;
move old content root and link the new one. lookout for the correct user path! (the directory will be created later on)
su
mv /usr/share/nginx/{html,-old}
ln -s /home/user/dswarm-backoffice-web/yo/publish /usr/share/nginx/html
10. setup tomcat
open /etc/tomcat7/server.xml at line 33 and add a driverManagerProtection="false" so that the line reads
<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" driverManagerProtection="false" />
at line 73, same file, add this option maxPostSize="104857600", so that the Connector block reads
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
maxPostSize="104857600"
URIEncoding="UTF-8"
redirectPort="8443" />
then, give tomcat some more memory
su
echo 'export CATALINA_OPTS="-Xms4G -Xmx4G -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:MaxPermSize=512M"' >> /usr/share/tomcat7/bin/setenv.sh
And finally, you have to tell Tomcat about Java 8. Open the file /etc/default/tomcat7 and around line 12, add this setting
# The home directory of the Java development kit (JDK). You need at least
# JDK version 1.5. If JAVA_HOME is not set, some common directories for
# OpenJDK, the Sun JDK, and various J2SE 1.5 versions are tried.
JAVA_HOME=/usr/lib/jvm/java-8-oracle
11. setup Data Hub (Neo4j)
increase file handlers at /etc/security/limits.conf
root soft nofile 40000
root hard nofile 40000
plus add ulimit -n 40000 into your neo4j service script (under /etc/init.d', e.g., /etc/init.d/neo4j-service`) before starting the daemon
edit /etc/neo4j/neo4j.properties and:
- insert some storage tweaks
dbms.pagecache.memory=8g
cache_type=weak
edit /etc/neo4j/neo4j-server.properties and:
- change the database location
org.neo4j.server.database.location=/data/neo4j/data/graph.db
- disable authentication
dbms.security.auth_enabled=false
- change the rrd database location
org.neo4j.server.webadmin.rrdb.location=/data/neo4j/data/rrd
- add our graph extension
org.neo4j.server.thirdparty_jaxrs_classes=org.dswarm.graph.resources=/graph
edit /etc/neo4j/neo4j-wrapper.conf and:
- insert an additional parameter (if your server is x64)
wrapper.java.additional.1=-d64
- tweak the java heap space size to an appropriate value according to your server ram memory, e.g.,
wrapper.java.initmemory=512
wrapper.java.maxmemory=8192
edit /etc/neo4j/logging.properties and change line 54 to be
java.util.logging.FileHandler.pattern=/data/neo4j/log/neo4j.%u.%g.log
then, create a symlink from the previous log location to the external partition
su
mv /var/lib/neo4j/data/log{,-old}
ln -s /data/neo4j/log /var/lib/neo4j/data/log
mkdir /data/neo4j/log
chown -R neo4j:adm /data/neo4j/log
By default, the Neo4j Server is bundled with a Web server that binds to host localhost on port 7474, answering only requests from the local machine. If you need remote access to the Neo4j Browser or the D:SWARM Graph Extension API, see Secure the port and remote client connection accepts
These steps require less privileged access
12. configure d:swarm
Follow the instructions in d:swarm Configuration and make sure Tomcat has read access to the configuration file and the tmp folder if you specified one (see also hints at the bottom of this page):
su
chown -R tomcat7:tomcat7 /path/to/your-tmp-folder
13. build D:SWARM Graph Extension
Add our Nexus server to your maven settings.xml. The file should be located in the folder "~/.m2". If the file doesn't exist, create it simply using this template.
pushd dswarm-graph-neo4j
mvn -U -PRELEASE -DskipTests clean package
popd
mv dswarm-graph-neo4j/target/graph-1.2-jar-with-dependencies.jar dswarm-graph-neo4j.jar
14. build backend
pushd dswarm
mvn -U -DskipTests clean install
pushd controller
mvn -U -DskipTests war:war
popd; popd
mv dswarm/controller/target/dswarm-controller-0.1-SNAPSHOT.war dmp.war
15. build frontend
pushd dswarm-backoffice-web; pushd yo
npm install
bower install
STAGE=unstable DMP_HOME=../../dswarm grunt build
popd
rsync --delete --verbose --recursive yo/dist/ yo/publish
popd
These steps require root level access
16. wire everything together
lookout for the correct path (/home/user)
su
rm /var/lib/tomcat7/webapps/dmp.war
rm -r /var/lib/tomcat7/webapps/dmp
cp /home/user/dmp.war /var/lib/tomcat7/webapps/
cp /home/user/dswarm-graph-neo4j.jar /usr/share/neo4j/plugins/
17. restart everything, if needed
su
/etc/init.d/mysql restart
/etc/init.d/neo4j-service restart
/etc/init.d/nginx restart
/etc/init.d/tomcat7 restart
18. initialize/reset Metadata Repository + Data Hub
This step requires less privileged access
When running the backend the first time, the Metadata Repository (MySQL database) needs to be initialized. When updated, a reset is required in case the schema or initial data has changed. lookout for the correct path (/home/user)
pushd dswarm/dev-tools
python reset-dbs.py \
--persistence-module=../persistence \
--user=dmp \
--password=dmp \
--db=dmp \
--neo4j=http://localhost:7474/graph
Or provide the credentials and values you configured.
Check python reset-dbs.py --help for additional information.
Install Task Processing Unit (TPU)
The Task Processing Unit (TPU) allows for batch-processing ingest, transform and export tasks. Refer to the documentation in the TPU project.
Update the System
1. update repository contents
pushd dswarm; git pull; popd
pushd dswarm-graph-neo4j; git pull; popd
pushd dswarm-backoffice-web; git pull; popd
2. repeat steps 13 (Building D:SWARM Graph Extension) to 18 (Init/reset Metadata Repository + Data Hub) from the installation as necessary
Checklist on Errors
First of all it's a good idea to know which of the four components frontend, backend, Metadata Repository (MySQL) and Data Hub (Neo4j database) does not run. If you already know, skip this list.
- frontend: open
http://localhost:9999(port defaults to 80 for server installation) in a browser. The front end should be displayed. - backend: open
http://localhost:8087/dmp/_ping(port defaults to 8080 for server installation) in a browser. The expected response is a page with the word pong. - Metadata Repository (MySQL database): open a terminal and type
mysql -udmp -p dmpto open a connection to MySQL and select the database dmp. Hint: check for correct user name, password and database name in case you did not use the default values. If you can log in, typeselect * from DATA_MODEL;. At least three internal data models should be listed. - Data Hub (Neo4j database): open
http://localhost:7474/browser/in a browser. The Neo4j browser should be opened. - D:SWARM Graph Extension: open
http://localhost:7474/graph/gdm/pingin a browser. The expected response is a page with the word pong.
Now that you know which component does not run, go through
- is curl installed?
- Did you choose a database name other than the default? If yes, you currently have to modify the
init_internal_schema.sql, which is internally used by the scriptreset-dbs.pyand change theUSEdatabase statement (this should be improved). - when building the projects with maven, did you use the
-Uoption to update project dependencies? - Check your [dswarm Configuration]]. Are database name and password correct, i.e., the ones used when installing the Metadata Repository (MySQL; step [Setup Metadata Repository)? Compare dswarm/persistence/src/main/resources/create_database.sql with dswarm/dswarm.conf or any other configuration option you use.
- Can Tomcat read the configuration file?
- initialize/reset the Metadata Repository + Data Hub. They may be empty or contain corrupted data caused by a failed unit tests.
- Did you miss an update of, e.g., the neo4j version? Compare your installed version with the required version (see step 5)
- Are the tmp folders and log folders existent and are they writeable (also for Tomcat)?
- If you specified a tmp folder in the config, make sure it contains a tmp/resources and log folder
- Did you set the maximum file-size for uploads (see Step 9) to a sufficient value for your scenario?
- In order to access the D:SWARM Graph Extension (e.g. .../graph/gdm/ping) you may have to allow access from other than localhost (see step Setup Data Hub).