Low level install instructions for the V1 synchronization service - IKANOW/Aleph2 GitHub Wiki
All of this will be provided via RPM. This is just an interim/internal page to capture the steps before that happens.
Currently this can only be done on one node on a cluster (This will change soon!)
The following install should take place on the API node of a v1.0+ IKANOW cluster, with the following Hadoop distribution installed:
- Any YARN based distribution (eg CDH5.x or HDP2.x) with the following services:
- Storm, Zookeeper, Kafka, HDFS, MapReduce v2
- Note the only distribution that ships all of the above out of the box is Hortonworks HDP 2.1+
- For v2 only functionality, only a vanilla install is required
- Don't forget to download the "site configuration" ZIP from Ambari and copy all the
*-site.xml
files into the local YARN config directory listed below (/opt/aleph2-home/yarn-config
)
- Don't forget to download the "site configuration" ZIP from Ambari and copy all the
- (for v1 analytics functionality, v1.0+ of the IKANOW platform is required and the following additional HDP install steps are required; otherwise just ensure that hadoop.standalone_mode=true in the v1 configuration, eg
/opt/infinite-install/config/infinite.configuration.properties
)
- For v2 only functionality, only a vanilla install is required
Create the following directory structure:
-
/opt/aleph2-home
/opt/aleph2-home/bin
/opt/aleph2-home/libs
/opt/aleph2-home/logs
/opt/aleph2-home/config
/opt/aleph2-home/cached-jars
/opt/aleph2-home/yarn-config
Then populate the directories:
- Copy the aleph2 JARs into
/opt/aleph2-home/libs
(see below for how to get them) - Copy the configuration file (see below) into
/opt/aleph2-home/config
- Copy all the files from the V1 Hadoop configuration directory into
/opt/aleph2-home/yarn-config
:-
cp /opt/hadoop-infinite/mapreduce/hadoop/*.xml /opt/aleph2-home/yarn-config/
- (If installing on an Infinit.e node running standalone Hadoop, then simply:
- a) download the HDFS, YARN, MRv2 "site configuration" zips from Ambari/HDP, unzip, and copy the
*site.xml
files into `/opt/aleph2-home/yarn-config/``- (or you can get the XML files directly from
/usr/hdp/current/hadoop-yarn-client/etc/hadoop/*-site.xml
)
- (or you can get the XML files directly from
- b) Run
sed -i s/'${hdp.version}'/<HDP_VERSION>/g /opt/aleph2-home/yarn-config/*.xml
- (Where "<HDP_VERSION>" can be obtained by doing
hadoop fs -ls /hdp/apps/
, eg "2.2.4.2-2"
- (Where "<HDP_VERSION>" can be obtained by doing
- a) download the HDFS, YARN, MRv2 "site configuration" zips from Ambari/HDP, unzip, and copy the
- (If installing on an Infinit.e node running standalone Hadoop, then simply:
-
- Copy
defaults.yaml
from the HDP storm configuration (eg from/usr/hdp/current/storm-client/conf/storm.yaml
) into/opt/aleph2-home/yarn-config/storm.yaml
(ie renaming it from defaults.yaml to storm.yaml) - Copy
zoo.cfg
from the HDP zookeeper configuration (eg from/usr/hdp/current/zookeeper-client/conf/zoo.cfg
)
"Chown" /opt/aleph2-home
recursively to tomcat.tomcat (chown -R tomcat.tomcat /opt/aleph2-home/
, using sudo
if necessary)
Using runuser hdfs -s /bin/sh -c "hadoop fs -mkdir -p <dir>"
, create the following directory structure:
-
/app
-
/app/aleph2
/app/aleph2/library
/app/aleph2/data
-
"Chown" /app/aleph2 recursively to tomcat (runuser hdfs -s /bin/sh -c "hadoop fs -chown -R tomcat /app/aleph2"
, using sudo
if necessary)
Inside /opt/aleph2-home/libs
, run runuser tomcat -c "java -classpath '/opt/aleph2-home/config/:./*' com.ikanow.aleph2.data_import_manager.harvest.modules.IkanowV1SynchronizationModule ../config/v1_sync_service.properties"
The following configuration file should be placed into /opt/aleph2-home/config
, called v1_sync_service.properties
:
# SERVICES
service.CoreDistributedServices.interface=com.ikanow.aleph2.distributed_services.services.ICoreDistributedServices
service.CoreDistributedServices.service=com.ikanow.aleph2.distributed_services.services.CoreDistributedServices
service.StorageService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IStorageService
service.StorageService.service=com.ikanow.aleph2.storage_service_hdfs.services.HDFSStorageService
service.ManagementDbService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IManagementDbService
service.ManagementDbService.service=com.ikanow.aleph2.management_db.mongodb.services.MongoDbManagementDbService
service.CoreManagementDbService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.IManagementDbService
service.CoreManagementDbService.service=com.ikanow.aleph2.management_db.services.CoreManagementDbService
service.SearchIndexService.interface=com.ikanow.aleph2.data_model.interfaces.data_services.ISearchIndexService
service.SearchIndexService.service=com.ikanow.aleph2.search_service.elasticsearch.services.ElasticsearchIndexService
# CONFIG
# MANAGEMENT DB:
MongoDbManagementDbService.mongodb_connection=localhost:27017
MongoDbManagementDbService.v1_enabled=true
# CORE DISTRIBUTED SERVICES
CoreDistributedServices.application_name=DataImportManager
CoreDistributedServices.application_port.DataImportManager=2252
# SEARCH INDEX
ElasticsearchCrudService.elasticsearch_connection=localhost:9300
#(use whatever cluster name is running at "elasticsearch_connection")
# DATA IMPORT MANAGER:
DataImportManager.harvest_enabled=true
DataImportManager.streaming_enrichment_enabled=true
DataImportManager.batch_enrichment_enabled=false
Place a file like the following into /opt/aleph2-home/config/log4j2.xml
:
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
</Console>
<RollingFile name="fileWriter"
fileName="/opt/aleph2-home/logs/v1_sync_service.log"
filePattern="/opt/aleph2-home/logs/v1_sync_service.%d{yyyy-MM-dd}.gz">
<PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
<TimeBasedTriggeringPolicy/>
</RollingFile>
</Appenders>
<Loggers>
<Root level="info">
<AppenderRef ref="fileWriter" />
</Root>
</Loggers>
</Configuration>
(NOTE: nightly builds are available here. The build instructions used to generate the nightlies are here)
In each of Aleph2 and Aleph2-contrib, from the top-level directory:
mvn -e clean install -Dmaven.test.skip=true [-Daleph2.version=<DESIRED VERSION ID>]
mvn -e clean package -Dmaven.test.skip=true -Daleph2.scope=provided [-Daleph2.version=<DESIRED VERSION ID>]
(You will need maven to point to a JDK 8.x - note the command line maven is recommended for "production JAR building" not Eclipse/M2E).
NOTE: there are currently some issues with circular test dependencies in this build - if Aleph2 fails at the management_db_service, go install Aleph2-contrib, which should work, and then come back and repeat the Aleph2 build again. To avoid the error simply only build aleph2_data_model and aleph2_core_distributed_services from Aleph2, then everything from Aleph2-contrib, then everything from Aleph2.
This generates a target
directory in each project directory with a JAR called --SNAPSHOT-shaded.jar.
These JARs should be copied into the /opt/aleph2-home/libs
directory, eg:
/cygdrive/c/cygwin/bin/find ~/github/Aleph2 -name "*-SNAPSHOT-shaded.jar" -exec scp '{}' ec2-USER@HOST:/opt/aleph2-home/libs/ \;
/cygdrive/c/cygwin/bin/find ~/github/Aleph2-contrib -name "*-SNAPSHOT-shaded.jar" -exec scp '{}' ec2-USER@HOST:/opt/aleph2-home/libs/ \;