Create new orka image (examples include Hadoop 2.7.1, Hue 3.9.0, Ecosystem on Hue 3.9.0 and latest Cloudera dist) - grnet/e-science GitHub Wiki
The following instructions cover the creation of ~okeanos images from Apache distributions (Base Hadoop+Flume, Base Hadoop+Flume+Hue, Enriched Hadoop Ecosystem) and also the latest Cloudera distribution.
Every instruction regarding image creation is executed as root.
Create VM in ~okeanos with Debian 8.x (currently 8.3) image. If needed change mirrors in /etc/apt/sources.list, for example:
deb http://ftp.gr.debian.org/debian/ jessie main
deb-src http://ftp.gr.debian.org/debian/ jessie main
deb http://security.debian.org/ jessie/updates main
deb-src http://security.debian.org/ jessie/updates main
# jessie-updates, previously known as 'volatile'
deb http://ftp.gr.debian.org/debian/ jessie-updates main
deb-src http://ftp.gr.debian.org/debian/ jessie-updates main
and then
apt-get update
apt-get upgrade
apt-get install sudo
nano /etc/apt/sources.list
Add line: deb http://apt.dev.grnet.gr jessie/
apt-get install curl
curl https://dev.grnet.gr/files/apt-grnetdev.pub | apt-key add -
apt-get update
apt-get install snf-image-creator
if asked for “supermin appliance”, choose “Yes”
apt-get install python-pip
pip install kamaki==0.13.5
echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee /etc/apt/sources.list.d/webupd8team-java.list
echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886
apt-get update
apt-get install oracle-java8-installer
apt-get install oracle-java8-set-default
nano /etc/sysctl.conf and add following lines:
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
nano /etc/ssh/ssh_config and uncomment/add the following lines:
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null
nano /etc/hosts file and remove second line (e.g. 127.0.1.1 snf-123456). We need this because VM's hostname will be used with the private IP assigned to it by ~okeanos for the Hadoop cluster.
cd /usr/local
wget http://apache.forthnet.gr/hadoop/common/stable/hadoop-2.7.1.tar.gz
tar xvzf /usr/local/hadoop-2.7.1.tar.gz
rm /usr/local/hadoop-2.7.1.tar.gz
wget https://www.apache.org/dist/flume/stable/apache-flume-1.6.0-bin.tar.gz
tar xvzf apache-flume-1.6.0-bin.tar.gz
mv apache-flume-1.6.0-bin /usr/local/flume
rm apache-flume-1.6.0-bin.tar.gz
export FLUME_CONF_DIR=/usr/local/flume/conf
cp $FLUME_CONF_DIR/flume-env.sh.template $FLUME_CONF_DIR/flume-env.sh
echo export JAVA_HOME=$JAVA_HOME >> $FLUME_CONF_DIR/flume-env.sh
echo export JAVA_OPTS=\"-Xms500m -Xmx2000m\" >> $FLUME_CONF_DIR/flume-env.sh
mkdir -p $FLUME_HOME/plugins.d
mkdir -p /var/log/flume
mkdir -p /var/run/flume
For ~okeanos image creation, the following command must be executed:
snf-mkimage --public --print-syspreps -f -u {{image_name}} -t {{token}} -a {{authentication url}} -r {{image_name}} /
kamaki image list | grep <new image name>
e.g for Hadoop-2.7.1
kamaki image list | grep Hadoop-2.7.1
will return
<some_pithos_uuid> Hadoop-2.7.1
After the image is created, uploaded on pithos and registered in kamaki, one additional action is required for the image to be usable.
Insert the newly created image in the database. This SQL script file can be checked for examples of how a new image (Orka or VRE) is added. The mandatory database fields are image_name, image_pithos_uuid and image_category_id.
For the Hadoop-2.7.1 image we did the following:
sudo -u postgres psql
\c escience;
INSERT INTO backend_orkaimage (id,image_name, image_pithos_uuid, image_components, image_category_id) VALUES (6,'Hadoop-2.7.1','<hadoop271_pithos_uuid>', '{"Debian":{"version":"8.0","help":"https://www.debian.org/"},"Hadoop":{"version":"2.7.1","help":"https://hadoop.apache.org/"},"Flume":{"version":"1.6","help":"https://flume.apache.org/"}}',2);
\q
Alternatively, from {{personal_orka_server_IP}}/admin, an administrator can login and add the Hadoop image in Orka Images table.
(from http://gethue.com/how-to-build-hue-on-ubuntu-14-04-trusty/) Create ~okeanos VM with Hadoop-2.7.1 image.
(sudo) apt-get update
(sudo) apt-get install ant gcc g++ libkrb5-dev libffi-dev libmysqlclient-dev libssl-dev libsasl2-dev libsasl2-modules-gssapi-mit libsqlite3-dev libtidy-0.99-0 libxml2-dev libxslt-dev make libldap2-dev maven python-dev python-setuptools libgmp3-dev
# pip install --upgrade cffi
# pip install cryptography
wget https://dl.dropboxusercontent.com/u/730827/hue/releases/3.9.0/hue-3.9.0.tgz
tar -xvzf hue-3.9.0.tgz
rm hue-3.9.0.tgz
cd hue-3.9.0
make install
Hue-3.9.0 is now installed on /usr/local/hue. Image now can be created in the same way described before:
snf-mkimage --public --print-syspreps -f -u Hue-3.9.0 -t {{token}} -a {{authentication url}} -r Hue-3.9.0 /
Insert the newly created image in the database. This SQL script file can be checked for examples of how a new image (Orka or VRE) is added. The mandatory database fields are image_name, image_pithos_uuid and image_category_id.
For the Hue-3.9.0 image:
sudo -u postgres psql
\c escience;
INSERT INTO backend_orkaimage (id, image_name, image_pithos_uuid, image_components, image_category_id) VALUES (7, 'Hue-3.9.0', '<hue390_pithos_uuid>', '{"Debian":{"version":"8.2","help":"https://www.debian.org/"},"Hadoop":{"version":"2.7.1","help":"https://hadoop.apache.org/"},"Flume":{"version":"1.6","help":"https://flume.apache.org/"},"Hue":{"version":"3.9.0","help":"http://gethue.com/"}}',3);
\q
Alternatively, from {{personal_orka_server_IP}}/admin, an administrator can login and add the Hue image in Orka Images table.
Create ~okeanos VM with Hue-3.9.0 image.
wget http://mirrors.myaegean.gr/apache/pig/latest/pig-0.15.0.tar.gz
tar -zxvf pig-0.15.0.tar.gz
mv pig-0.15.0/ /usr/local/pig
rm pig-0.15.0.tar.gz
apt-get install zip
wget http://mirrors.myaegean.gr/apache/oozie/4.1.0/oozie-4.1.0.tar.gz
tar -xvzf oozie-4.1.0.tar.gz
cd oozie-4.1.0
mvn clean package assembly:single -P hadoop-2 -DskipTests
mkdir Oozie
cp -R distro/target/oozie-4.1.0-distro/oozie-4.1.0/ Oozie/
cd Oozie/oozie-4.1.0
mkdir libext
cp -R ../../hadooplibs/hadoop-2/target/hadooplibs/hadooplib-2.3.0.oozie-4.1.0/* libext/
cd libext/
wget http://dev.sencha.com/deploy/ext-2.2.zip
cd ../../
mv oozie-4.1.0/ /usr/local/oozie
cd /usr/local/oozie/bin
./oozie-setup.sh prepare-war
wget http://apache.tsl.gr/hbase/stable/hbase-1.1.2-bin.tar.gz
tar -xvzf hbase-1.1.2-bin.tar.gz
mv hbase-1.1.2/ /usr/local/hbase
rm hbase-1.1.2-bin.tar.gz
wget http://apache.forthnet.gr/spark/spark-1.5.0/spark-1.5.0-bin-hadoop2.6.tgz
tar xvzf spark-1.5.0-bin-hadoop2.6.tgz
mv spark-1.5.0-bin-hadoop2.6/ /usr/local/spark
echo "deb https://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list
apt-get install apt-transport-https
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 642AC823
apt-get update
apt-get install sbt
cd /usr/local/spark
git clone https://github.com/spark-jobserver/spark-jobserver.git
apt-get install subversion
cd /usr/local/
svn co http://svn.apache.org/repos/asf/hive/trunk hive
cd hive
mvn clean package install -DskipTests -Phadoop-2,dist
cd conf/
cp hive-env.sh.template hive-env.sh
nano hive-env.sh and add following lines:
export HIVE_CONF_DIR=$HIVE_HOME/conf
export HIVE_AUX_JARS_PATH=$HIVE_HOME/lib
export HADOOP_HOME=/usr/local/hadoop
and then continue from /usr/local/hive:
cd packaging/target/apache-hive-1.2.0-SNAPSHOT-bin/apache-hive-1.2.0-SNAPSHOT-bin/lib
cp -r * /usr/local/hive/lib/
apt-get install libpostgresql-jdbc-java
ln -s /usr/share/java/postgresql-jdbc4.jar /usr/local/hive/lib/postgresql-jdbc4.jar
snf-mkimage --public --print-syspreps -f -u Ecosystem-on-Hue-3.9.0 -t {{token}} -a {{authentication url}} -r Ecosystem-on-Hue-3.9.0 /
Insert the newly created image in the database. This SQL script file can be checked for examples of how a new image (Orka or VRE) is added. The mandatory database fields are image_name, image_pithos_uuid and image_category_id.
For the Ecosystem-on-Hue-3.9.0 image:
sudo -u postgres psql
\c escience;
INSERT INTO backend_orkaimage (id,image_name, image_pithos_uuid, image_components, image_category_id) VALUES (8, 'Ecosystem-on-Hue-3.9.0', '<ecosystem390_pithos_uuid>', '{"Debian":{"version":"8.2","help":"https://www.debian.org/"},"Hadoop":{"version":"2.7.1","help":"https://hadoop.apache.org/"},"Flume":{"version":"1.6","help":"https://flume.apache.org/"},"Hue":{"version":"3.9.0","help":"http://gethue.com/"},"Pig":{"version":"0.15.0","help":"http://pig.apache.org/"},"Hive":{"version":"1.2.0","help":"http://hive.apache.org/"},"Hbase":{"version":"1.1.2","help":"http://hbase.apache.org/"},"Oozie":{"version":"4.1.0","help":"http://oozie.apache.org/"},"Spark":{"version":"1.5.0","help":"http://spark.apache.org/"}}',4);
\q
Alternatively, from {{personal_orka_server_IP}}/admin, an administrator can login and add the Ecosystem image in Orka Images table.
Every instruction regarding image creation is executed as root.
Create VM in ~okeanos with Debian 7.8 image. If needed change mirrors in /etc/apt/sources.list, for example:
deb http://ftp.gr.debian.org/debian wheezy main
deb-src http://ftp.gr.debian.org/debian wheezy main
deb http://security.debian.org/ wheezy/updates main
deb-src http://security.debian.org/ wheezy/updates main
# wheezy-updates, previously known as 'volatile'
deb http://ftp.debian.org/debian/ wheezy-updates main
deb-src http://ftp.debian.org/debian/ wheezy-updates main
and then
apt-get update
apt-get upgrade
apt-get install sudo
apt-get install curl
curl https://dev.grnet.gr/files/apt-grnetdev.pub | apt-key add -
nano /etc/apt/sources.list and add line :
deb http://apt.dev.grnet.gr wheezy/
then:
apt-get update
apt-get install snf-image-creator
if asked for “supermin appliance”, choose “Yes”
apt-get install python-pip
pip install kamaki==0.13.5
echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee /etc/apt/sources.list.d/webupd8team-java.list
echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee -a /etc/apt/sources.list.d/webupd8team-java.list
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886
apt-get update
apt-get install oracle-java8-installer
apt-get install oracle-java8-set-default
update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/java-8-oracle/bin/java" 1
update-alternatives --config java
apt-get install postgresql postgresql-client
cd ~
wget http://archive.cloudera.com/cdh5/one-click-install/wheezy/amd64/cdh5-repository_1.0_all.deb
dpkg -i cdh5-repository_1.0_all.deb
wget http://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/archive.key -O archive.key
apt-key add archive.key
apt-get update; apt-get install hadoop-yarn-resourcemanager
apt-get install hadoop-hdfs-namenode
apt-get install hadoop-hdfs-secondarynamenode
apt-get install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
apt-get install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver
apt-get install hadoop-client
apt-get install pig
apt-get install hue
Configuring and using PostgreSQL for Hue Server steps 4-17.
apt-get install oozie
apt-get install oozie-client
su - postgres
psql
CREATE ROLE oozie LOGIN ENCRYPTED PASSWORD 'some_password' NOSUPERUSER INHERIT CREATEDB NOCREATEROLE;
CREATE DATABASE "oozie" WITH OWNER = oozie ENCODING = 'UTF8' TABLESPACE = pg_default CONNECTION LIMIT = -1;
\q
listen_addresses property to '*'
standard_conforming_strings property is set to off
# IPv4 local connections:
host oozie oozie 0.0.0.0/0 md5
/etc/init.d/postgresql restart
wget http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
apt-get install unzip
unzip ext-2.2.zip -d /var/lib/oozie
apt-get install spark-core spark-master spark-worker spark-history-server spark-python
apt-get install hive hive-metastore hive-server2 hive-hbase
(according to http://www.cloudera.com/documentation/enterprise/5-2-x/topics/cdh_ig_hive_metastore_configure.html)
apt-get install libpostgresql-jdbc-java
ln -s /usr/share/java/postgresql-jdbc4.jar /usr/lib/hive/lib/postgresql-jdbc4.jar
sudo -u postgres psql
postgres=# CREATE USER hiveuser WITH PASSWORD 'some_password';
postgres=# CREATE DATABASE metastore;
postgres=# \c metastore;
You are now connected to database 'metastore'.
metastore=# \i /usr/lib/hive/scripts/metastore/upgrade/postgres/hive-schema-0.12.0.postgres.sql
SET
SET
...
sudo -u postgres psql
\c metastore
metastore=# \pset tuples_only on
metastore=# \o /tmp/grant-privs
metastore=# SELECT 'GRANT SELECT,INSERT,UPDATE,DELETE ON "' || schemaname || '". "' ||tablename ||'" TO hiveuser ;'
metastore-# FROM pg_tables
metastore-# WHERE tableowner = CURRENT_USER and schemaname = 'public';
metastore=# \o
metastore=# \pset tuples_only off
metastore=# \i /tmp/grant-privs
apt-get install hbase
apt-get install hbase-master
apt-get install hbase-thrift
apt-get install hbase-rest
apt-get install hbase-regionserver
apt-get install flume-ng flume-ng-agent flume-ng-doc
Crete a temp directory to host data that will be streamed to hdfs
mkdir /usr/lib/flume-ng/tmp
update-rc.d -f <service> remove
where service:
flume-ng-agent, hive-metastore, hadoop-hdfs-datanode, hive-server2, hadoop-hdfs-namenode, hue, hadoop-hdfs-secondarynamenode, hadoop-mapreduce-historyserver, oozie, hadoop-yarn-nodemanager, hadoop-yarn-proxyserver, spark-history-server, hadoop-yarn-resourcemanager, spark-master, hbase-master, spark-worker, hbase-regionserver, hbase-rest, hbase-thrift
This is required because if Cloudera services are left to boot during cluster creation, ssh connection from staging/production server to the master VM of the cluster will fail and the cluster will be destroyed.
For image creation (Cloudera-CDH-5.4.7), following command must be executed:
snf-mkimage --public --print-syspreps -f -u {{image_name}} -t {{token}} -a {{authentication url}} -r {{image_name}} /
Insert the newly created image in the database. This SQL script file can be checked for examples of how a new image (Orka or VRE) is added. The mandatory database fields are image_name, image_pithos_uuid and image_category_id.
For the Cloudera-CDH-5.4.7 image:
sudo -u postgres psql
\c escience;
INSERT INTO backend_orkaimage (id,image_name, image_pithos_uuid, image_components, image_category_id) VALUES (9, 'Cloudera-CDH-5.4.7', '<cloudera547_pithos_uuid>','{"Debian":{"version":"7.8","help":"https://www.debian.org/"},"Hadoop":{"version":"2.6.0-cdh5.4.7","help":"https://hadoop.apache.org/"},"Flume":{"version":"1.5.0-cdh5.4.7","help":"https://flume.apache.org/"},"Hue":{"version":"3.7.0","help":"http://gethue.com/"},"Pig":{"version":"0.12.0-cdh5.4.7","help":"http://pig.apache.org/"},"Hive":{"version":"1.1.0+cdh5.4.7","help":"http://hive.apache.org/"},"Hbase":{"version":"1.0.0-cdh5.4.7","help":"http://hbase.apache.org/"},"Oozie":{"version":"4.1.0-cdh5.4.7","help":"http://oozie.apache.org/"},"Spark":{"version":"1.3.0","help":"http://spark.apache.org/"},"Cloudera":{"version":"5.4.7","help":"http://www.cloudera.com/content/cloudera/en/home.html"}}',5);
\q
Alternatively, from {{personal_orka_server_IP}}/admin, an administrator can login and add the Cloudera image in Orka Images table.