Overview

All of this will be provided via RPM. This is just an interim/internal page to capture the steps before that happens.

Currently this can only be done on one node on a cluster (This will change soon!)

Pre-install requirements

The following install should take place on the API node of a v1.0+ IKANOW cluster, with the following Hadoop distribution installed:

Any YARN based distribution (eg CDH5.x or HDP2.x) with the following services:
- Storm, Zookeeper, Kafka, HDFS, MapReduce v2
- Note the only distribution that ships all of the above out of the box is Hortonworks HDP 2.1+
  - For v2 only functionality, only a vanilla install is required
    - Don't forget to download the "site configuration" ZIP from Ambari and copy all the *-site.xml files into the local YARN config directory listed below (/opt/aleph2-home/yarn-config)
  - (for v1 analytics functionality, v1.0+ of the IKANOW platform is required and the following additional HDP install steps are required; otherwise just ensure that hadoop.standalone_mode=true in the v1 configuration, eg /opt/infinite-install/config/infinite.configuration.properties)

Configuring the local file system

Create the following directory structure:

/opt/aleph2-home
- /opt/aleph2-home/bin
- /opt/aleph2-home/libs
- /opt/aleph2-home/logs
- /opt/aleph2-home/config
- /opt/aleph2-home/cached-jars
- /opt/aleph2-home/yarn-config

Then populate the directories:

Copy the aleph2 JARs into /opt/aleph2-home/libs (see below for how to get them)
Copy the configuration file (see below) into /opt/aleph2-home/config
Copy all the files from the V1 Hadoop configuration directory into /opt/aleph2-home/yarn-config:
- cp /opt/hadoop-infinite/mapreduce/hadoop/*.xml /opt/aleph2-home/yarn-config/
  - (If installing on an Infinit.e node running standalone Hadoop, then simply:
    - a) download the HDFS, YARN, MRv2 "site configuration" zips from Ambari/HDP, unzip, and copy the *site.xml files into `/opt/aleph2-home/yarn-config/``
      - (or you can get the XML files directly from /usr/hdp/current/hadoop-yarn-client/etc/hadoop/*-site.xml)
    - b) Run sed -i s/'${hdp.version}'/<HDP_VERSION>/g /opt/aleph2-home/yarn-config/*.xml
      - (Where "<HDP_VERSION>" can be obtained by doing hadoop fs -ls /hdp/apps/, eg "2.2.4.2-2"
Copy defaults.yaml from the HDP storm configuration (eg from /usr/hdp/current/storm-client/conf/storm.yaml) into /opt/aleph2-home/yarn-config/storm.yaml (ie renaming it from defaults.yaml to storm.yaml)
Copy zoo.cfg from the HDP zookeeper configuration (eg from /usr/hdp/current/zookeeper-client/conf/zoo.cfg)

"Chown" /opt/aleph2-home recursively to tomcat.tomcat (chown -R tomcat.tomcat /opt/aleph2-home/, using sudo if necessary)

Configuring the distributed file system

Using runuser hdfs -s /bin/sh -c "hadoop fs -mkdir -p <dir>", create the following directory structure:

/app
- /app/aleph2
  - /app/aleph2/library
  - /app/aleph2/data

"Chown" /app/aleph2 recursively to tomcat (runuser hdfs -s /bin/sh -c "hadoop fs -chown -R tomcat /app/aleph2", using sudo if necessary)

Run the synchronization service

Inside /opt/aleph2-home/libs, run runuser tomcat -c "java -classpath '/opt/aleph2-home/config/:./*' com.ikanow.aleph2.data_import_manager.harvest.modules.IkanowV1SynchronizationModule ../config/v1_sync_service.properties"

Configuration file

The following configuration file should be placed into /opt/aleph2-home/config, called v1_sync_service.properties:

# SERVICES
service.CoreDistributedServices.interface=com.ikanow.aleph2.distributed_services.services.ICoreDistributedServices
service.CoreDistributedServices.service=com.ikanow.aleph2.distributed_services.services.CoreDistributedServices
service.StorageService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IStorageService
service.StorageService.service=com.ikanow.aleph2.storage_service_hdfs.services.HDFSStorageService
service.ManagementDbService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IManagementDbService
service.ManagementDbService.service=com.ikanow.aleph2.management_db.mongodb.services.MongoDbManagementDbService
service.CoreManagementDbService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IManagementDbService
service.CoreManagementDbService.service=com.ikanow.aleph2.management_db.services.CoreManagementDbService
service.SearchIndexService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.ISearchIndexService
service.SearchIndexService.service=com.ikanow.aleph2.search_service.elasticsearch.services.ElasticsearchIndexService
# CONFIG

# MANAGEMENT DB:
MongoDbManagementDbService.mongodb_connection=localhost:27017
MongoDbManagementDbService.v1_enabled=true

# CORE DISTRIBUTED SERVICES
CoreDistributedServices.application_name=DataImportManager
CoreDistributedServices.application_port.DataImportManager=2252

# SEARCH INDEX
ElasticsearchCrudService.elasticsearch_connection=localhost:9300
#(use whatever cluster name is running at "elasticsearch_connection")

# DATA IMPORT MANAGER:
DataImportManager.harvest_enabled=true
DataImportManager.streaming_enrichment_enabled=true
DataImportManager.batch_enrichment_enabled=false

Logging

Place a file like the following into /opt/aleph2-home/config/log4j2.xml:

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
        <Appenders>
                <Console name="Console" target="SYSTEM_OUT">
                        <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
                </Console>
         <RollingFile name="fileWriter"
                     fileName="/opt/aleph2-home/logs/v1_sync_service.log"
                     filePattern="/opt/aleph2-home/logs/v1_sync_service.%d{yyyy-MM-dd}.gz">
                        <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
            <TimeBasedTriggeringPolicy/>
        </RollingFile>
        </Appenders>
        <Loggers>
                <Root level="info">
                        <AppenderRef ref="fileWriter" />
                </Root>
        </Loggers>
</Configuration>

Obtaining the Aleph2 JARs

(NOTE: nightly builds are available here. The build instructions used to generate the nightlies are here)

In each of Aleph2 and Aleph2-contrib, from the top-level directory:

mvn -e clean install -Dmaven.test.skip=true [-Daleph2.version=<DESIRED VERSION ID>]
mvn -e clean package -Dmaven.test.skip=true -Daleph2.scope=provided [-Daleph2.version=<DESIRED VERSION ID>]

(You will need maven to point to a JDK 8.x - note the command line maven is recommended for "production JAR building" not Eclipse/M2E).

NOTE: there are currently some issues with circular test dependencies in this build - if Aleph2 fails at the management_db_service, go install Aleph2-contrib, which should work, and then come back and repeat the Aleph2 build again. To avoid the error simply only build aleph2_data_model and aleph2_core_distributed_services from Aleph2, then everything from Aleph2-contrib, then everything from Aleph2.

This generates a target directory in each project directory with a JAR called --SNAPSHOT-shaded.jar.

These JARs should be copied into the /opt/aleph2-home/libs directory, eg:

/cygdrive/c/cygwin/bin/find ~/github/Aleph2 -name "*-SNAPSHOT-shaded.jar" -exec scp '{}' ec2-USER@HOST:/opt/aleph2-home/libs/ \;
/cygdrive/c/cygwin/bin/find ~/github/Aleph2-contrib -name "*-SNAPSHOT-shaded.jar" -exec scp '{}' ec2-USER@HOST:/opt/aleph2-home/libs/ \;

Low level install instructions for the V1 synchronization service - IKANOW/Aleph2 GitHub Wiki

Overview

Pre-install requirements

Configuring the local file system

Configuring the distributed file system

Run the synchronization service

Configuration file

Logging

Obtaining the Aleph2 JARs

⚠️ GitHub.com Fallback ⚠️

Low level install instructions for the V1 synchronization service - IKANOW/Aleph2 GitHub Wiki

Overview

Pre-install requirements

Configuring the local file system

Configuring the distributed file system

Run the synchronization service

Configuration file

Logging

Obtaining the Aleph2 JARs

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️